How a message flows through.

This is the most important diagram in the guide. If you remember only one thing, remember this one: accept work durably first, then run it. The flow shown here is the target flow for M1 reliability. It does not claim app-close survival until the detached worker exists.

Reliability contract

M1 guarantees accepted turns are recorded, reconnects can replay recent stream events, engine crashes produce visible retry state, and scheduled triggers are audited. M1 does not guarantee an in-flight CLI keeps running after the engine process dies. That needs the worker in Chapter 6.

01
UserDesktop UI
Types "draft Q3 report" and hits send.
02
Desktop UIEngine
POST /v1/agents/Sales/sessions with the user message and a client-generated turnId (UUID v4).
03
EngineSQLite
INSERT into turns with status queued. If turnId already exists in a terminal state (completed/failed/cancelled), return its outcome instead of starting a new turn. This is how retries are safe.
04
EngineDesktop UI
Returns 200 OK with { sessionKey, turnId }. UI shows the message as accepted.
05
EngineClaude CLI
Spawns the Claude CLI subprocess as Linux user hou_sales, in the agent's folder.
06
EngineSQLite
UPDATE turns set status running, started_at = now().
Streaming loop · repeats per token chunk
07
Claude CLIEngine
Streams a chunk of assistant tokens (~10-30 chunks/sec at sentence granularity).
08
EngineSQLite
INSERT into turn_stream with the next seq number. Batched: a tokio interval flushes every 20-50ms so we don't drown SQLite with hundreds of single-row inserts.
09
EngineDesktop UI
Broadcasts the chunk over the WebSocket only after the batch that contains it has been accepted by SQLite. The batch window is small, but durability still comes before user-visible delivery.
10
Claude CLIEngine
Subprocess exits cleanly.
11
EngineSQLite
UPDATE turns set status completed, completed_at = now().
12
EngineDesktop UI
Final completion event over the WebSocket. UI marks the turn as done.

Green steps are SQLite writes. Anything rendered as durable must be written first.

The key insight

Writes to the database happen BEFORE the work, not AFTER. The turn row is inserted the moment the user's message is accepted, well before the CLI subprocess is even spawned. Each streaming chunk is assigned a sequence number and written in a small batch before broadcast. If SQLite is backpressured, the stream slows down; it does not pretend data is durable before it is.

That single ordering decision is what makes everything else work.

What this buys you

What this does NOT buy you on its own: engine restart mid-stream survives the bounce. Today the CLI subprocess is a child of the engine, so killing the engine kills the CLI. To survive that, we need the detached houston-turn-worker from Chapter 6 or a different process-lifecycle contract.

Protocol changes you need to know

What's there today

Most of the boxes in the diagram exist. The CLI gets spawned, the response streams, the final message gets saved to chat_feed. What's missing:

The five failure modes this prevents

Chapter 5 walks through what goes wrong because these tables aren't there. Chapter 6 covers the fix. The one people miss: the CLI subprocess hangs without exiting. Without started_at, last_heartbeat_at, and a sweep job, the engine has no way to know a turn has been running for an hour with no output.

Where to look in the code

Today's flow: engine/houston-engine-server/src/routes/sessions.rsengine/houston-engine-core/src/sessions/mod.rsengine/houston-agents-conversations/src/session_runner.rs. Streaming items dropped at session_runner.rs:405-406. Async persistence at session_runner.rs:226-244. Session identity model at engine/houston-engine-core/src/sessions/control.rs:6-18 (no turnId today).