agent-deck Capability E2E Dashboard

Generated 2026-05-27T18:36:36Z

One card per capability agent-deck promises a user can do. Each fast-gate capability is verified by a test that performs the real action through the compiled binary on an isolated tmux socket and asserts on the real effect: registry rows or live pane content. Nightly capabilities need real agents, keys, or network and run out of band.

Capabilities
18
Green
15
Failed
0
Nightly only
3
Not yet covered
0

Session lifecycle

PASS

Register a session

We run the add command to register a new session, then read the saved registry back. We confirm the row exists with the title, tool, group, and folder we asked for.

Pass when: new registry row carries the given title, tool, group, and working directory

Tier F . Runtime: 0.2s . TestCapability_Lifecycle_Add
Local-isolated
What the terminal showed at verification
Profile: default

TITLE                GROUP           PATH                                     ID
--------------------------------------------------------------------------------------------
cap-add              capgrp          /tmp/TestCapability_Lifecycle_Add3841... 18761039-177

Total: 1 sessions
PASS

Start a session

We register a session and start it. We then ask the throwaway tmux server whether a live pane actually appeared, and confirm the registry flips the session to an active state.

Pass when: a real tmux pane appears on the isolated socket and status becomes active

Tier F . Runtime: 1.8s . TestCapability_Lifecycle_Start
Local-isolated
What the terminal showed at verification
IDLE (1):
  ○ cap-start        shell      -                      ~/project

Total: 1 sessions in profile 'default'
PASS

Stop a session

We start a session, then stop it. We confirm the tmux pane is gone and the registry returns the session to the stopped state.

Pass when: tmux pane disappears and registry status returns to stopped

Tier F . Runtime: 1.6s . TestCapability_Lifecycle_Stop
Local-isolated
What the terminal showed at verification
Profile: default

TITLE                GROUP           PATH                                     ID
--------------------------------------------------------------------------------------------
cap-stop             001             /tmp/TestCapability_Lifecycle_Stop156... 5f61729b-177

Total: 1 sessions
PASS

Restart a session

We start a session and restart it. We confirm exactly one pane exists afterward (no accidental duplicate) and the session is active again.

Pass when: exactly one pane remains after restart and status is active (guards the #30 double-spawn)

Tier F . Runtime: 2.8s . TestCapability_Lifecycle_Restart
Local-isolated
What the terminal showed at verification
IDLE (1):
  ○ cap-restart      shell      -                      ~/project

Total: 1 sessions in profile 'default'
PASS

Remove a session

We confirm a stopped session can be removed and disappears from the registry, and that removing a still-running session is refused unless forced, so it is not destroyed by accident.

Pass when: stopped session leaves the registry; a running session is refused without force

Tier F . Runtime: 11.3s . TestCapability_Lifecycle_Rm
Local-isolated
What the terminal showed at verification
$ agent-deck session remove cap-rm-running
Error: session 'cap-rm-running' is in state 'starting'; only stopped/error sessions may be removed without --force

$ agent-deck list   (after stopping and removing cap-rm-stopped)
Profile: default

TITLE                GROUP           PATH                                     ID
--------------------------------------------------------------------------------------------
cap-rm-running       001             /tmp/TestCapability_Lifecycle_Rm21683... c04ab467-177

Total: 1 sessions
PASS

Launch in one step

We use the single launch command, which creates, starts, and messages a session at once, pointed at the stand-in echo agent. We confirm the registry row exists and the echoed message shows up on screen.

Pass when: one launch command creates the row, starts the pane, and the echoed message appears

Tier F . Runtime: 7.1s . TestCapability_Lifecycle_Launch
Local-isolatedDeterministic-stub
What the terminal showed at verification
ECHOBOT READY > PINGLAUNCH-cap-e2e-token
ECHO:PINGLAUNCH-cap-e2e-token
PASS

Fork guard

Forking is only valid for supported tools with live context (Claude session ID or Pi JSONL). We confirm forking an unsupported session is cleanly refused and creates no orphan child row. The full context-inheriting fork is a documented nightly gap.

Pass when: forking an unsupported session is refused and no child row is created

Tier F . Runtime: 0.1s . TestCapability_Lifecycle_Fork
Local-isolated
What the terminal showed at verification
$ agent-deck session fork cap-fork
Error: session 'cap-fork' is not a Claude session (tool: shell)

Agent interaction

PASS

Send a message to an agent and read its reply

We launch a tiny stand-in agent that simply repeats whatever you say. We send it a unique message through the normal send command, then read the screen back. If the screen shows the echoed message we know it reached the agent and a reply came out the other side.

Pass when: the pane shows ECHO:<token> after a real send, proving readiness, send-keys, and capture read-back

Tier F . Runtime: 6.4s . TestCapability_Agent_EchoRoundTrip
Local-isolatedDeterministic-stub
What the terminal showed at verification
ECHOBOT READY > PING-TestCapability_Agent_EchoRoundTrip
ECHO:PING-TestCapability_Agent_EchoRoundTrip
NIGHTLY

Real agent round trip (Claude)

We launch a real Claude session, wait for its prompt, send a fixed instruction, and check the reply. This needs a real API key and network, so it runs nightly, not on every release.

Pass when: a real Claude reply contains the expected token

Tier N . Runtime: not measured
Real-agentNeeds-credsNeeds-network
NIGHTLY

Fork inherits tool context

We fork a live Claude or Pi session and confirm the child inherits the conversation with a distinct id and a parent link. This needs real tool session data from a live transcript, so it runs nightly.

Pass when: child session links the parent and inherits conversation context

Tier N . Runtime: not measured
Real-agentNeeds-credsNeeds-network

MCP

PASS

Attach and detach an MCP

We register a stub tool server, attach it to a session, and read the session's .mcp.json file back to confirm the entry was written. We then detach it and confirm the entry is gone. We also confirm attaching an unknown server is refused and never writes a broken config.

Pass when: .mcp.json gains the server entry on attach and loses it on detach; an unknown server is refused

Tier F . Runtime: 0.2s . TestCapability_MCP_AttachDetach
Local-isolatedDeterministic-stub
What the terminal showed at verification
$ agent-deck mcp attached cap-mcp
Session: cap-mcp

LOCAL (~/project/.mcp.json):
  • stubmcp

.mcp.json on disk:
{
  "mcpServers": {
    "stubmcp": {
      "type": "stdio",
      "command": "true",
      "args": [
        "--noop"
      ]
    }
  }
}
NIGHTLY

MCP actually loads in the agent

We attach a tool server and ask a real agent to list its tools, confirming the agent honors the attachment. This needs a real agent to introspect, so it runs nightly.

Pass when: the agent lists the attached MCP server

Tier N . Runtime: not measured
Real-agentNeeds-credsNeeds-network

Worktrees

PASS

Create and finish a git worktree

Against a throwaway git repo, we create a session on a new branch in its own worktree and confirm the worktree directory really exists on disk. We then run finish, and confirm the worktree and branch are removed, the session is gone, and the ORIGINAL repository is untouched.

Pass when: the worktree dir is created on its own branch, then finish removes it and the session while leaving the source repo intact (the #1200 data-loss guard)

Tier F . Runtime: 0.3s . TestCapability_Worktree_CreateFinish
Local-isolated
What the terminal showed at verification
$ agent-deck worktree info cap-wt   (after add --worktree -b)
Session:        cap-wt
Branch:         feature/capfeature
Worktree Path:  ~/repo/.worktrees/feature-capfeature
Main Repo:      ~/repo
Status:         exists

$ agent-deck list   (after worktree finish: session gone, repo intact)
No sessions found in profile 'default'.

Groups and profiles

PASS

Organize sessions into groups

We create sessions in two different groups and confirm that filtering the registry by group returns exactly that group's members, with no session bleeding across groups.

Pass when: each group lists exactly its own sessions; no cross-group leakage

Tier F . Runtime: 0.3s . TestCapability_Groups_Filtering
Local-isolated
What the terminal showed at verification
Groups:

NAME                 SESSIONS   STATUS
--------------------------------------------------
alpha                2          ○ 2
beta                 1          ○ 1

Total: 2 groups, 3 sessions
PASS

Keep profiles isolated

We add one session under the default profile and another under a separate profile, then list each profile. We confirm neither profile can see the other's session.

Pass when: a session in one profile is invisible to the other (no cross-profile data bleed)

Tier F . Runtime: 0.4s . TestCapability_Profiles_Isolation
Local-isolated
What the terminal showed at verification
$ agent-deck list   (default profile)
Profile: default

TITLE                GROUP           PATH                                     ID
--------------------------------------------------------------------------------------------
in-default           001             /tmp/TestCapability_Profiles_Isolatio... 7eca0dea-177

Total: 1 sessions

$ agent-deck -p capalt list   (isolated profile)
Profile: capalt

TITLE                GROUP           PATH                                     ID
--------------------------------------------------------------------------------------------
in-capalt            001             /tmp/TestCapability_Profiles_Isolatio... 625039ff-177

Total: 1 sessions

Multi-tool

PASS

Bring multiple tool kinds to readiness

We launch two distinct stand-in tools and confirm each one reaches an active state AND echoes back the unique message we sent it, proving the launch and readiness machinery is not tied to a single tool.

Pass when: two different tools each reach active and echo their token back

Tier F . Runtime: 14.1s . TestCapability_MultiTool_Readiness
Local-isolatedDeterministic-stub
What the terminal showed at verification
tool "echobot" (cap-tool-a):
ECHOBOT READY > PINGA-cap-e2e
ECHO:PINGA-cap-e2e

tool "parrot" (cap-tool-b):
ECHOBOT READY > PINGB-cap-e2e
ECHO:PINGB-cap-e2e

Conductor comms backbone

PASS

Worker reports it finished

A worker prints a completion sentinel on its last turn. We run the real Stop-hook handler and confirm it records a done outcome (status and summary), then run the notifier daemon and confirm it emits a distinct finished signal to the parent instead of an ambiguous waiting. An ordinary turn with no sentinel records no completion.

Pass when: the Stop-hook persists done status=ok, and the daemon emits a finished event carrying that outcome (issue #1186)

Tier F . Runtime: 2.1s . TestCapability_Conductor_FinishedSignal
Local-isolatedDeterministic-stub
What the terminal showed at verification
hook status file (what the Stop-hook persisted):
{
  "status": "waiting",
  "event": "Stop",
  "done_status": "ok",
  "done_summary": "capability wave2 shipped"
}

finished event emitted by notify-daemon (the [DONE] signal to the parent):
child_title: cap-child
child_session_id: 0a81e066-1779906947
kind: finished
done_status: ok
done_summary: capability wave2 shipped
PASS

An idle worker does not re-notify

We make a worker transition once, then run the notifier daemon twice over the same idle worker. We confirm exactly one notification is produced across both passes, proving the de-duplication ledger persists between polls.

Pass when: two daemon passes over the same idle child emit exactly one event (issue #1187 dedup)

Tier F . Runtime: 0.3s . TestCapability_Conductor_Dedup
Local-isolatedDeterministic-stub
What the terminal showed at verification
two `notify-daemon --once` passes over the same idle child
emitted exactly one transition event (the second was deduped):

child_title: dd-child
child_session_id: 5b085245-1779906947
from_status: running
to_status: idle