Test: qa-node-02-success-reply (matrix NODE-02)
Date: 2026-05-12
Runner: Docker (sg docker)
Commands:
  sg docker -c 'docker build -t anet-qa-node-02 -f tests/qa-node-02-success-reply/Dockerfile .'
  sg docker -c 'docker run --rm anet-qa-node-02'

Result: PASS
Runtime: ~8.5s warm, ~25s cold

Coverage (9 steps hard-asserted):
- [0] hub boot
- [1] admin login (with retry loop — admin bootstrap can lag /health by ~1-3s)
- [2] admin POST /api/networks → network_id
- [3] admin mint ntok for alias "mock-echo"
- [4] mock agent report_status(idle) via MCP /mcp → session row created
- [5] admin POST /api/task → task_id captured
- [6] PRE task.status = 'delivered' (state-machine entry)
- [7] mock agent send_reply via MCP with in_reply_to=task_id, status=replied, text=<body>
- [8] POST task.status = 'replied' + tasks.result == body + completed_at set
- [9] /api/messages contains the reply text (sender-side visibility)

Why mock-via-MCP (vs real agent-node):
  docker-e2e SC05 tests the FAILED-reply path with a real agent-node container
  (using fake MiniMax key that makes LLM calls fail). The SUCCESS path needs a
  real working LLM = burns money + flaky to test. By calling send_reply MCP
  directly with curl, we treat the agent as a black box and verify only the
  commhub contract: "given a valid send_reply(replied, in_reply_to, text),
  the task state machine and result fields update correctly". The agent CLI
  itself is covered by E2E.

Contracts surfaced by R6:

1. POST /api/networks response shape differs by caller:
   - admin: { ok, network_id, network_name } (top-level)
   - non-admin (R5 saw): { ok, network: { network_id, ... } } (nested)
   Defensive jq: `.network.network_id // .network_id // empty`.

2. Admin login may 401 briefly after /health returns 200 (bootstrap race).
   Add retry loop with ~10s budget before giving up.

3. send_reply (status=replied) atomically transitions the tasks row only if
   current status IN ('created', 'delivered', 'acked', 'running')
   (server/src/tools.ts L613-614). Terminal states (replied/failed/cancelled)
   are not overwritten — silently no-op. Worth a separate test (NODE-02b) for
   idempotency.

Resources:
  - Docker (sg docker)
  - node:20-slim + bun + jq + unzip
  - @sleep2agi/agent-network@preview from npm
  - 0 LLM API calls
