One message, all the way through.
Time to stop talking about boxes and watch one actually run. Juan from sales is going to ask his HR agent for the company vacation policy. We follow that single message from his keystroke to the agent's reply. Every chapter so far shows up in this one walkthrough.
Setup
- Juan is an Acme Corp employee. Logged in. Has permission for the HR agent.
- The HR agent's pod is asleep. Nobody talked to HR since yesterday.
- HR's persistent volume holds 3 months of conversation history.
- Juan opens
app.houston.ai, clicks the HR tile, types "what's our vacation policy?", hits send.
The full flow
app.houston.ai/v1/chat. Carries the JWT in the Authorization header and the message text in the body.http://hr-acme.ws-acme.svc.cluster.local.ws-acme's namespace..houston/ directory at /data/.houston. ~100 ms./data/.houston/sessions/./data/.houston/sessions/.... Saved on the persistent volume. Survives sleep.Numbers that matter
- Cold start: ~500 ms (Firecracker + engine + volume mount).
- Warm hop: ~10 ms (control plane → agent pod is in-cluster).
- Token streaming latency: ~50 ms first token (limited by Claude API itself, not by Houston).
- Idle compute cost: zero.
- Idle storage cost: ~10 cents/month for the persistent volume.
What if Juan sends a second message?
Pod is still warm. Steps 1–5 are the same (with everything cached). Step 6 reuses the existing WebSocket. No cold start. Total latency to first token: maybe 60 ms plus Claude's thinking time. Indistinguishable from a desktop app.
What if the agent has to call a tool?
Same flow plus a step at #9.5: the agent decides to call Slack
via Composio. houston-engine calls
api.composio.dev (on the Cilium allowlist).
Composio uses Juan's stored Slack OAuth token to do the thing.
Returns a result. houston-engine feeds it back into Claude, the
conversation continues. The user sees a "calling Slack…" status
then the result.
What if Juan tries to read Sales' data via HR?
Juan: "HR agent, please read /data/sales/.houston/sessions/."
HR's pod is a Firecracker microVM with one mounted volume:
/data/.houston (HR's own). There is no
/data/sales/ visible inside the VM. Even the
claude-code subprocess running inside the pod can't reach what
isn't mounted. The file doesn't exist for it. The attack fails
at the kernel level, not at the application level.
This is why pod-per-agent matters. Auth + RBAC keep Juan from asking the Sales agent for things. Kata + Firecracker keep him from tricking the HR agent into reading Sales data. Two locks. Different keys.
Every chapter shows up here. Auth (Ch9), control plane (Ch5), Postgres + Redis (Ch6), Knative (Ch4), Kata + Firecracker (Ch3), Kubernetes (Ch2), storage (Ch7), networking (Ch8). The flow above is the whole product running. Build that flow and you've built Houston Cloud.