Runner Ownership, Channel Progress, Tmux Capture, and Run-State Truth
This page is a visual guide for the current clisbot architecture around
session-owned run lifecycle, runner-owned tmux behavior, channel-owned rendering, and the failure modes
created when those boundaries blur. It starts with the shortest mental model first, then opens up the full flow.
Main RuleOutput observation is not run-lifecycle truth.
Runner OwnerRunnerService owns backend readiness and liveness probes.
Channel OwnerChannels own rendering, preview, attach/watch UX, and message-tool handoff.
Quick Model
If you only need one picture in your head, use this:
a user message comes in from a channel, the channel resolves route and surface policy,
AgentService forwards to SessionService,
SessionService owns whether a run exists and whether a new prompt is allowed,
RunnerService makes tmux or the backend ready,
and monitorTmuxRun() is only supposed to observe pane facts.
1. Admission
SessionService decides whether this sessionKey can accept a new prompt or must reject with active-run guidance.
2. Backend
RunnerService owns bootstrap, resume, tmux capture, submit, and backend-specific recovery.
3. Surface
Channels own previews, message-tool handoff, attach or watch commands, and how progress becomes a Slack or Telegram message.
Owner Map
AgentService
Thin facade at runtime entrypoints.
Wires channels and control into session-owned behavior.
Should not become a second orchestration owner.
SessionService
Owns sessionKey-scoped active run truth.
Owns admission control, observer attachment, queue semantics, and final settlement.
Only this layer should transition a run back to idle.
Reads run updates; should not redefine the run state machine.
Session State Store
Persists runtime.state, startedAt, detachedAt, message-tool reply timestamps, and continuity metadata.
Read model for status and follow-up decisions, not its own owner boundary.
Full Flow
This is the end-to-end path for a normal routed prompt. Read left to right. Each column owns a different kind of truth.
1. Channel
Inbound event
Slack or Telegram parses text, thread or topic identity, slash command, attachments, and route config.
Policy choice
Chooses responseMode, streaming, queue behavior, and observer commands such as /attach.
2. AgentService
Facade entry
Resolves the target and forwards work to session-owned or runner-owned boundaries.
No canonical state
It should stay as wiring, not become a second place that decides run truth.
3. SessionService
Admission control
Rejects or accepts a new prompt based on active-run truth for this sessionKey.
Active-run model
Creates in-memory active run, persists runtime.state = running, owns observers.
4. RunnerService
ensureRunnerReady()
Reuses or creates tmux session, handles startup blocker patterns, trust prompts, and session-id resume rules.
Backend truth
Knows whether the backend is actually ready, blocked, or recoverable.
5. monitorTmuxRun()
Pane observer
Submits prompt, polls pane snapshots, derives meaningful deltas, and emits running, detached, completed, or timeout callbacks.
Current risk
Today it still emits a terminal timeout from noOutputTimeoutMs, even though that only proves lack of visible output.
6. Channel Render
Preview or tool handoff
In capture-pane, pane settlement becomes the surface reply. In message-tool, the channel only previews and waits for tool reply boundaries.
User-visible outcome
Thread gets previews, final reply, detached note, or active-run guidance.
Renders progress, handles /attach, /watch, preview handoff, and message-tool behavior.
→
Facade
AgentService
Thin entrypoint that forwards to the real session-owned and runner-owned boundaries.
→
Canonical owner
SessionService
Owns active-run truth, admission control, observers, queue semantics, and final settlement.
→
Backend owner
RunnerService
Owns bootstrap, resume, readiness, tmux session handling, and backend-specific recovery.
→
Execution
Tmux + Native CLI
Real runner process: Codex, Claude, Gemini, and the terminal state they produce.
Observation and control interpretation
Known risk
Current Gap
No visible output can still get treated too much like a terminal timeout, which can clear active-run truth too early.
←
Observer only
monitorTmuxRun()
Polls pane snapshots, derives meaningful deltas, and emits callbacks for running, detach, idle-after-output, and timeout-like observation events.
↓
Desired contract
SessionService consumes facts
Lifecycle should move only after stronger evidence than “pane stayed quiet”.
Most visible mismatch today
Surface mismatch
message-tool Mode
Chat final reply can appear before backend run truly settles, creating the classic “looks done, still active” problem.
→
Safer shape
Low-Side-Effect Contract
Observer reports facts only. SessionService owns lifecycle. RunnerService probes silent-but-alive vs blocked vs dead.
Read this as three stacked layers:
surface UX on top, runtime ownership in the middle, and the mismatch or fix shape at the bottom.
The important point is that message-tool final and no visible output are not enough by themselves to rewrite canonical run lifecycle.
Canonical owner boundary Surface or observer boundary Known mismatch or risk zone
State Machines
Run State
Canonical Session Runtime
idle: no active run admitted.
running: run is live, admission closed.
detached: run is still live, observation window ended, admission still closed.
Observer State
Channel Observation
live: post every running update.
poll: post every interval.
passive-final: stop live updates but keep final settlement.
Surface State
Channel Message Rendering
capture-pane: pane owns final user-visible settlement.
message-tool: tool reply owns canonical final message.
streaming off: may suppress preview while run still exists.
Mismatch Risk
Silent-But-Alive Runner
Pane changes are filtered out as non-meaningful.
noOutputTimeoutMs fires.
Current code may clear active-run truth too early.
Run lifecycle transitions
Happy Path
User prompt admitted.
runtime.state = running.
Monitor emits running deltas.
Idle-after-output or true completion settles run.
SessionService sets state back to idle.
Long Run Path
Observation exceeds maxRuntimeMs.
Run transitions to detached.
Final settlement should still arrive later.
Admission remains closed until real finish or stop.
Problem Zones
Core mismatch:
the current code sometimes lets an observational signal behave like lifecycle truth.
That is the root cause behind "it says active run", "attach does not help", and "final reply already appeared but I still need stop".
Problem A - "Timed out waiting for visible output."
What It Means Today
The monitor did not see meaningful visible output within noOutputTimeoutMs. This does not prove the backend has stopped.
Why It Is Dangerous
Current flow can convert that observation into a terminal run settlement and return the session to idle too early.
Risk: the backend may still be running, but the session can look free again.
Problem B - message-tool final reply can appear before run lifecycle is done
Surface Done
The agent sent clisbot message send --final, so the human sees a final chat reply.
Lifecycle Not Done
The active run remains live until SessionService receives real settlement from monitor callbacks or an explicit stop.
Result: the chat looks done, but a new prompt is still blocked by active-run admission. /attach only observes; it does not clear lifecycle state.
Problem C - attach, watch, and stop semantics
/attach
Reattaches this thread to current updates. It is an observer action, not a lifecycle action.
/watch
Changes observer mode to interval polling. Still no lifecycle ownership.
/stop
Explicit intervention. This is the only one of the three that is supposed to change the backend run itself.
Low-Side-Effect Fix Shape
This keeps the current architecture. It does not require inventing another manager.
It only tightens the contract between the existing owner boundaries.
Rule 1
monitorTmuxRun() should emit facts only.
running delta seen
idle-after-output
max-runtime reached
no visible output yet
Rule 2
SessionService should own all terminal transitions.
Only this layer may clear active run and reopen admission.
Need stronger evidence than "pane silent".
Rule 3
RunnerService should answer silent-but-alive vs blocked vs dead.
Probe session liveness.
Probe runner waiting or still busy.
Map backend-specific uncertainty into one normalized answer.
Rule 4
message-tool --final needs a truthful contract.
Either only send final when backend work is truly done.
Or surface a clear "reply sent, backend still active" state.
The main design sentence is:
observational timeout must not automatically mean lifecycle completion.
Sources Used
This artifact was built against the current code and docs, especially the owner-boundary and interaction-processing paths.