One agent, one tiny computer.
That is the whole architecture. Every agent in Houston Cloud gets its own miniature Linux machine, sealed off from everything else, asleep until someone talks to it. This chapter is the one page version. Every chapter after this zooms in on one piece.
The core decision
Each agent runs in its own Kata container, which is a fancy way of saying its own little virtual computer powered by Firecracker. Those little computers run on a cluster managed by Kubernetes. When an agent isn't being used, its computer is turned off and costs zero. When you send a message, the computer boots in under a second, does the work, then turns off again.
Everything else in this guide flows from that one decision.
The pieces
Frontend
- One React web app at
app.houston.ai, served from a CDN. - The existing desktop app can also point at the cloud engine instead of a local one. Same wire protocol. Same code, two doors.
Control plane
The brain in front of every agent. About 3 to 5 copies running at all times.
- Handles login (Supabase does the actual heavy lifting).
- Stores who-owns-what in Postgres: workspaces, users, agents, permissions, billing.
- Holds the live chat session state in Redis (short term memory).
- Decides which agent a message goes to, checks if the user is allowed, then forwards it.
- Wakes the agent's pod if it's asleep.
Agent pods (the meat)
- One Kata + Firecracker microVM per agent.
- Inside each one: the
houston-enginebinary plus the bundled CLIs (Claude Code, Codex, Composio). - Knative turns them off when idle, boots them when needed.
- Each agent's pod lives inside its team's K8s namespace.
Per team isolation
- Each workspace gets its own Kubernetes namespace.
- A NetworkPolicy stops different teams' agents from ever talking to each other.
- Big enterprise customers can pay for their own dedicated machines.
Per agent isolation
The pod boundary is the agent boundary.
- Each pod has its own filesystem, its own OAuth tokens, its own data.
- Juan asks the HR agent to peek at Sales files. The HR agent's pod literally cannot see the Sales pod's disk. No clever sandboxing inside a shared pod required.
State
- Each agent gets a small persistent volume that holds its
.houston/directory. - Survives pod restarts and scale to zero.
- Storage is cheap (a few cents per GB per month). Compute is the expensive part, and that's zero when idle.
Permissions
- Admin sets, in the UI: "Juan can talk to HR agent. Cannot talk to Sales agent."
- Enforced at the control plane. The only path to an agent's pod goes through the control plane, so RBAC is final.
The stack at a glance
| Layer | What we use | Plain English |
|---|---|---|
| Cluster | EKS or GKE | Rented Kubernetes from Amazon or Google. |
| Runtime isolation | Kata Containers with Firecracker | Tiny VMs that wrap each pod with a real hardware wall. |
| Scale to zero | Knative Serving | Turns pods off when idle, boots on demand. |
| Network policy | Cilium or Calico | Decides who is allowed to talk to whom. |
| Auth | Supabase | Handles login, sessions, SSO. |
| Metadata DB | Postgres | Long term memory. Users, agents, permissions, billing. |
| Session cache | Redis | Short term memory. Live chat state. |
| Object storage | S3 or GCS | Files, attachments, backups. |
| Analytics | PostHog | Who clicked what. |
| Errors | Sentry | What broke and where. |
| Frontend | React (@houston-ai/*) | The same components the desktop already uses. |
| Agent runtime | houston-engine + CLIs | The same Rust binary that runs on the desktop today. |
Why this shape
Agents can read each other's stuff. If two agents share a machine and one of them is told "read the other guy's files," it can. Solution: each agent gets its own machine. Done.
Idle agents cost money. A sales bot used once a day shouldn't cost the same as one running 24 by 7. Solution: turn off the box when nobody's talking. Knative does it for us.
Different teams can't share a machine. Acme's HR agent should never accidentally route to Globex. Solution: each team in its own Kubernetes namespace, walled off with NetworkPolicy.
Agent state has to survive a restart. The agent's notes, memory, OAuth tokens. Solution: per agent persistent volume. Disk lives even when the pod doesn't.
What we are not doing
Three architectural rejects, so we don't relitigate them later.
- Pod per user, agents share a pod. Cheaper, simpler, fewer pods to schedule, but loses the per agent trust boundary. Requires
bubblewrapor similar sandboxes inside the pod. Rejected: the pod boundary is free, why fight it. - Throw away our engine, run Hermes or OpenClaw. Their engines are great. They are also competitors with funded teams and fast moving APIs. Rejected: stay on
houston-engine, freeze it at "good enough," invest hours into the moat layer (UX, Composio, deployment) instead. - Bubblewrap inside the pod. Was Plan A when we thought one pod served a whole team. With pod per agent, the pod itself is the wall. Bwrap becomes redundant complexity.
What this buys us
- Clean trust boundary. Pod equals agent equals isolation. No clever sandboxing.
- Scale to zero. A customer with 10,000 agents pays only for the ones currently in a conversation.
- Real team isolation. Workspace boundary is enforced by Kubernetes, not aspiration.
- Reuses the engine. No rewrite. Container image is just
houston-engineplus the CLIs. - Multi engine optional later. The pod can hold any runtime that speaks our supervisor protocol. We don't decide now.
Chapters 2 through 9 explain each piece of the stack from the ground up. Chapter 10 walks through a single message end to end. Chapter 11 is the build order with gates. If you only have ten minutes, you just read the most important chapter.