Inter-agent messaging is unchanged. Publishers send to an address; they don't care what's on the other side. New: a class of address — a topic declared by an ephemeral agent template — that, when delivered to, runs a host-shell prepare hook, creates a tmux session, pastes the engine start command into it, waits for the agent to signal completion, then runs cleanup.
The publisher writes the same send(addr, msg) either way.
Fire-and-forget. Replies arrive as separate inbound messages with optional in_reply_to. Approvals use their own CRUD subcommands.
Topic delivery sequence
Two kinds of hooks, by execution surface:
prepare / cleanup are host-shell (run via proxy exec) ·
start is tmux paste (typed into the pane).
The proxy gets no new commands.
Agent template anatomy
Same shape as today's persona files (YAML frontmatter + markdown body). Lives in agents/*.md.
New fields are scoped to ephemeral templates: persistent, cwd_base, cwd_template, repo_root, prepare, cleanup, topics.
Existing persona files load unchanged with persistent: true.
---
id: aws-account-lead
persistent: false
engine: claude
model: opus
# cwd_base is real & existing — used as create_session cwd.cwd_base: /var/agentic/work/aws-account-lead# cwd_template is per-message; prepare creates this directory.cwd_template: /var/agentic/work/aws-account-lead/wt-{{message_id}}# Host-shell hooks (run via proxy exec). NEW mechanism.prepare: |
git -C "$REPO_ROOT" worktree add "$WORKTREE_PATH" main
cleanup: |
git -C "$REPO_ROOT" worktree remove --force "$WORKTREE_PATH"
# Tmux-paste hook (today's mechanism). Typed into the pane.start: |
cd "$WORKTREE_PATH" && claude --session-id "$MESSAGE_ID" < "$MESSAGE_PATH"
topics:
- name: provision
schema: ./schemas/provision.json
reply_schema: ./schemas/provision-reply.json
concurrency: 1
monitor_template: aws-account-monitor
- name: teardown
schema: ./schemas/teardown.json
prepare: ./teardown-prepare.sh # optional per-topic override
---
# AWS Account Lead
You provision and tear down AWS accounts...
Two env contracts, by hook kind
prepare / cleanup · host shell (proxy exec)
MESSAGE_PATH, MESSAGE_CONTENT
REPLY_PATH, STATUS_PATH · cleanup can inspect outcome
WORKTREE_PATH · prepare creates, cleanup removes
CWD_BASE, REPO_ROOT
AGENT_TEMPLATE, TOPIC_NAME, MESSAGE_ID
INSTANCE_ADDR, REPLY_TO_ADDR
start · tmux paste
Same vars exported into the session via tmux set-environment before paste, so the line can reference them and they expand as it's typed.
all of the above, plus
TMUX_SESSION · the session that was just created
Approvals — CRUD + auto-notify
Approvals are categorised by channel (approval:<channel>), not by topic. A channel is a UI feed label, not a routing target — the auto-notify message goes to the requester's agent address, never to approval:.
The collab binary detects ephemeral context from env (presence of $MESSAGE_ID + $AGENT_TEMPLATE + $REPLY_PATH) and adapts its help text, exposed subcommands, and the system-prompt addendum injected into the engine. Persistent-mode behaviour matches today.
Persistent mode
collab send <addr> --payload …
collab approval create / get / set / withdraw / await
Banner: "You are agent:gitea-lead. Messages arrive in your inbox; reply by sending."
Banner: "You are handling message $MESSAGE_ID on topic <tmpl>/<topic>. Call collab complete when done."
The system prompt composed by persona.ts also branches on mode: ephemeral agents are told they handle exactly one message and must complete; persistent agents are told they have an ongoing inbox.
Backwards compat: today's collab send syntax (<target> --topic <category> <message>) keeps working. The --topic flag retains its 2.x meaning (message category); v3's topic: is encoded in the <addr> prefix. The client-side /api/agents target validation in bin/collab is widened to accept prefixed addresses without rejecting them as "no such agent".
Template-only fields (persistent, cwd_base, cwd_template, repo_root, prepare, cleanup, topics) are stored exclusively in agent_templates via a new template-sync routine — they never flow through field-registry.buildUpsertOptsFromFrontmatter, so no ALTER TABLE agents runs.
agent_instances is intentionally separate from agents to keep the persistent-agent state machine uncontaminated by ephemeral concerns. Health-monitor and cool-down explicitly exclude rows in agent_instances.
Proxy role
No new commands on the proxy. The existing /command vocabulary suffices: prepare/cleanup go through exec; tmux session via create_session / kill_session / has_session; start goes through today's tmux paste path.
Active-instance state (instance ↔ tmux session ↔ worktree) lives in the orchestrator's DB (agent_instances), not in the proxy. This preserves the proxy's thin-stateless design intent.
Crash recovery
Scenario
Recovery
Orchestrator restart while instances are live
Walk agent_instances in non-terminal state. For each: check $STATUS_PATH — if present, process completion. Otherwise ask proxy has_session: alive → resume waiting; gone → run cleanup hook, mark failed, requeue topic_queue row per policy.
Proxy restart while instances are live
All sessions on that proxy are gone. On its next registration heartbeat, orchestrator marks every live agent_instances row on that proxy failed, runs cleanup hooks where the worktree still exists on disk, requeues topic_queue entries.
Orphaned worktrees
cleanup hook owns worktree removal. If it fails or never runs, a periodic sweep checks cwd_base against live agent_instances.worktree_path values and removes leftovers.
Agent forgets to call collab complete
Per-instance outer timeout (configurable per topic) trips → orchestrator treats as failure: kill tmux session, run cleanup, requeue or fail.
Approval mid-flight at restart
Pure DB state. No recovery needed — pending approvals just remain pending until a human resolves them.
What changes
Concern
2.x
3.0
Persistent agents
persona file → live tmux
agent file (persistent: true) → live tmux · same observable behaviour
Sending messages
send(agentId, msg)
send(address, msg) · agent or topic · same fire-and-forget