Current Code Map - 2026-04-16

Runner Ownership, Channel Progress, Tmux Capture, and Run-State Truth

This page is a visual guide for the current clisbot architecture around session-owned run lifecycle, runner-owned tmux behavior, channel-owned rendering, and the failure modes created when those boundaries blur. It starts with the shortest mental model first, then opens up the full flow.

Main Rule Output observation is not run-lifecycle truth.
Lifecycle Owner SessionService owns active-run truth.
Runner Owner RunnerService owns backend readiness and liveness probes.
Channel Owner Channels own rendering, preview, attach/watch UX, and message-tool handoff.

Quick Model

If you only need one picture in your head, use this: a user message comes in from a channel, the channel resolves route and surface policy, AgentService forwards to SessionService, SessionService owns whether a run exists and whether a new prompt is allowed, RunnerService makes tmux or the backend ready, and monitorTmuxRun() is only supposed to observe pane facts.

1. Admission

SessionService decides whether this sessionKey can accept a new prompt or must reject with active-run guidance.

2. Backend

RunnerService owns bootstrap, resume, tmux capture, submit, and backend-specific recovery.

3. Surface

Channels own previews, message-tool handoff, attach or watch commands, and how progress becomes a Slack or Telegram message.

Owner Map

AgentService
  • Thin facade at runtime entrypoints.
  • Wires channels and control into session-owned behavior.
  • Should not become a second orchestration owner.
SessionService
  • Owns sessionKey-scoped active run truth.
  • Owns admission control, observer attachment, queue semantics, and final settlement.
  • Only this layer should transition a run back to idle.
RunnerService
  • Owns backend readiness, resume, tmux session bootstrap, capture, submit, and runner-specific recovery.
  • Should answer: is the backend alive, blocked, resumable, or dead?
monitorTmuxRun()
  • Observes pane snapshots and meaningful deltas.
  • Detects idle-after-output and max-runtime detach.
  • Should report facts, not own lifecycle consequences.
Channel Interaction Processing
  • Owns placeholder rendering, streaming preview, message-tool preview handoff, and slash-command UX.
  • Reads run updates; should not redefine the run state machine.
Session State Store
  • Persists runtime.state, startedAt, detachedAt, message-tool reply timestamps, and continuity metadata.
  • Read model for status and follow-up decisions, not its own owner boundary.

Full Flow

This is the end-to-end path for a normal routed prompt. Read left to right. Each column owns a different kind of truth.

1. Channel

Inbound event Slack or Telegram parses text, thread or topic identity, slash command, attachments, and route config.
Policy choice Chooses responseMode, streaming, queue behavior, and observer commands such as /attach.

2. AgentService

Facade entry Resolves the target and forwards work to session-owned or runner-owned boundaries.
No canonical state It should stay as wiring, not become a second place that decides run truth.

3. SessionService

Admission control Rejects or accepts a new prompt based on active-run truth for this sessionKey.
Active-run model Creates in-memory active run, persists runtime.state = running, owns observers.

4. RunnerService

ensureRunnerReady() Reuses or creates tmux session, handles startup blocker patterns, trust prompts, and session-id resume rules.
Backend truth Knows whether the backend is actually ready, blocked, or recoverable.

5. monitorTmuxRun()

Pane observer Submits prompt, polls pane snapshots, derives meaningful deltas, and emits running, detached, completed, or timeout callbacks.
Current risk Today it still emits a terminal timeout from noOutputTimeoutMs, even though that only proves lack of visible output.

6. Channel Render

Preview or tool handoff In capture-pane, pane settlement becomes the surface reply. In message-tool, the channel only previews and waits for tool reply boundaries.
User-visible outcome Thread gets previews, final reply, detached note, or active-run guidance.

Architecture Diagram

Read this as three stacked layers: surface UX on top, runtime ownership in the middle, and the mismatch or fix shape at the bottom. The important point is that message-tool final and no visible output are not enough by themselves to rewrite canonical run lifecycle.
Canonical owner boundary Surface or observer boundary Known mismatch or risk zone

State Machines

Run State

Canonical Session Runtime

  • idle: no active run admitted.
  • running: run is live, admission closed.
  • detached: run is still live, observation window ended, admission still closed.
Observer State

Channel Observation

  • live: post every running update.
  • poll: post every interval.
  • passive-final: stop live updates but keep final settlement.
Surface State

Channel Message Rendering

  • capture-pane: pane owns final user-visible settlement.
  • message-tool: tool reply owns canonical final message.
  • streaming off: may suppress preview while run still exists.
Mismatch Risk

Silent-But-Alive Runner

  • Pane changes are filtered out as non-meaningful.
  • noOutputTimeoutMs fires.
  • Current code may clear active-run truth too early.
Run lifecycle transitions
Happy Path
  1. User prompt admitted.
  2. runtime.state = running.
  3. Monitor emits running deltas.
  4. Idle-after-output or true completion settles run.
  5. SessionService sets state back to idle.
Long Run Path
  1. Observation exceeds maxRuntimeMs.
  2. Run transitions to detached.
  3. Final settlement should still arrive later.
  4. Admission remains closed until real finish or stop.

Problem Zones

Core mismatch: the current code sometimes lets an observational signal behave like lifecycle truth. That is the root cause behind "it says active run", "attach does not help", and "final reply already appeared but I still need stop".
Problem A - "Timed out waiting for visible output."
What It Means Today

The monitor did not see meaningful visible output within noOutputTimeoutMs. This does not prove the backend has stopped.

Why It Is Dangerous

Current flow can convert that observation into a terminal run settlement and return the session to idle too early.

Risk: the backend may still be running, but the session can look free again.
Problem B - message-tool final reply can appear before run lifecycle is done
Surface Done

The agent sent clisbot message send --final, so the human sees a final chat reply.

Lifecycle Not Done

The active run remains live until SessionService receives real settlement from monitor callbacks or an explicit stop.

Result: the chat looks done, but a new prompt is still blocked by active-run admission. /attach only observes; it does not clear lifecycle state.
Problem C - attach, watch, and stop semantics
/attach

Reattaches this thread to current updates. It is an observer action, not a lifecycle action.

/watch

Changes observer mode to interval polling. Still no lifecycle ownership.

/stop

Explicit intervention. This is the only one of the three that is supposed to change the backend run itself.

Low-Side-Effect Fix Shape

This keeps the current architecture. It does not require inventing another manager. It only tightens the contract between the existing owner boundaries.

Rule 1

monitorTmuxRun() should emit facts only.

  • running delta seen
  • idle-after-output
  • max-runtime reached
  • no visible output yet
Rule 2

SessionService should own all terminal transitions.

  • Only this layer may clear active run and reopen admission.
  • Need stronger evidence than "pane silent".
Rule 3

RunnerService should answer silent-but-alive vs blocked vs dead.

  • Probe session liveness.
  • Probe runner waiting or still busy.
  • Map backend-specific uncertainty into one normalized answer.
Rule 4

message-tool --final needs a truthful contract.

  • Either only send final when backend work is truly done.
  • Or surface a clear "reply sent, backend still active" state.
The main design sentence is: observational timeout must not automatically mean lifecycle completion.

Sources Used

This artifact was built against the current code and docs, especially the owner-boundary and interaction-processing paths.

This page is intentionally self-contained so it can be opened directly from a lightweight static HTTP server on sandbox.