One message, all the way through.

Time to stop talking about boxes and watch one actually run. Juan from sales is going to ask his HR agent for the company vacation policy. We follow that single message from his keystroke to the agent's reply. Every chapter so far shows up in this one walkthrough.

Setup

The full flow

01
Juan's browser Load balancer
HTTPS POST to app.houston.ai/v1/chat. Carries the JWT in the Authorization header and the message text in the body.
02
Load balancer Control plane pod
Routes to one of the 5 control plane replicas. Round robin.
03
Control plane
Verifies JWT signature locally using Supabase's public key. ~1 ms.
04
Control plane Redis
"Can user juan-id-123 talk to agent hr-acme-456?" Cached from a previous lookup. ~0.1 ms. Yes.
05
Control plane Redis
"Where is hr-acme-456's pod?" Cached. Returns Knative URL http://hr-acme.ws-acme.svc.cluster.local.
06
Control plane Knative
Opens a WebSocket to the agent's URL. Knative sees no pod is running. This is the cold start.
Cold start (~500 ms total)
06a
Knative
Asks Kubernetes to schedule a new pod for hr-acme. K8s picks a node in ws-acme's namespace.
06b
Kata Containers
Boots a Firecracker microVM with the houston-engine image. ~125 ms.
06c
Kubernetes
Mounts the persistent volume holding HR's .houston/ directory at /data/.houston. ~100 ms.
06d
houston-engine
Starts inside the VM. Reads agent manifest. Opens its HTTP + WS server. ~200 ms.
06e
Knative Control plane
"Pod is ready." WebSocket connects.
07
Control plane Agent pod
Forwards Juan's message: "what's our vacation policy?"
08
houston-engine
Reads the agent's CLAUDE.md (HR persona). Loads recent session history from /data/.houston/sessions/.
09
houston-engine Claude API
Spawns the claude-code CLI as a subprocess. Streams Juan's message + the agent's system prompt + history. Outbound traffic allowed by Cilium policy (api.anthropic.com is on the allowlist).
10
Claude houston-engine
Streams tokens back. Houston-engine emits each token over its WebSocket as it arrives.
11
Agent pod Control plane Juan's browser
Token by token, the answer streams back to Juan. He starts reading before Claude has finished generating. Feels instant.
12
houston-engine
Writes the full turn (Juan's message + agent's reply) to /data/.houston/sessions/.... Saved on the persistent volume. Survives sleep.
13
Postgres Control plane
Audit log entry: "juan-id-123 sent a message to hr-acme-456 at 14:22:07." Billing event: "+1 message, +N tokens for ws-acme."
14
Juan
Reads the answer. Closes the tab.
15
Knative
2 minutes of no traffic to hr-acme. Kills the pod. Persistent volume detaches but persists. Back to zero compute cost.

Numbers that matter

What if Juan sends a second message?

Pod is still warm. Steps 1–5 are the same (with everything cached). Step 6 reuses the existing WebSocket. No cold start. Total latency to first token: maybe 60 ms plus Claude's thinking time. Indistinguishable from a desktop app.

What if the agent has to call a tool?

Same flow plus a step at #9.5: the agent decides to call Slack via Composio. houston-engine calls api.composio.dev (on the Cilium allowlist). Composio uses Juan's stored Slack OAuth token to do the thing. Returns a result. houston-engine feeds it back into Claude, the conversation continues. The user sees a "calling Slack…" status then the result.

What if Juan tries to read Sales' data via HR?

Juan: "HR agent, please read /data/sales/.houston/sessions/."

HR's pod is a Firecracker microVM with one mounted volume: /data/.houston (HR's own). There is no /data/sales/ visible inside the VM. Even the claude-code subprocess running inside the pod can't reach what isn't mounted. The file doesn't exist for it. The attack fails at the kernel level, not at the application level.

This is why pod-per-agent matters. Auth + RBAC keep Juan from asking the Sales agent for things. Kata + Firecracker keep him from tricking the HR agent into reading Sales data. Two locks. Different keys.

The whole guide in one walkthrough

Every chapter shows up here. Auth (Ch9), control plane (Ch5), Postgres + Redis (Ch6), Knative (Ch4), Kata + Firecracker (Ch3), Kubernetes (Ch2), storage (Ch7), networking (Ch8). The flow above is the whole product running. Build that flow and you've built Houston Cloud.