Build order. Honest sizing. Real gates.
The architecture is settled. The question is sequence. Some pieces unlock others; some are independent. This is the order we build, the gate each milestone has to pass before the next starts, and the rough time it takes.
Every milestone has a goal, a list of deliverables, a gate, and a size. Sizes are honest engineering estimates, not commitments to investors. Gates are the conditions that say "we're done with this, on to the next." If a gate fails, we don't skip, we fix.
Milestone C1 — Container the engine (2 weeks)
Take the existing houston-engine binary and put it
in a Docker image. Verify it boots, serves the HTTP API, and
can run a Claude Code session inside the container exactly
like it does on the desktop.
- Dockerfile based on a slim Linux base.
- Bundles the engine binary + Claude Code CLI + Codex CLI + Composio CLI (already produced by
scripts/fetch-cli-deps.sh). - Mounts
/data/.houstonfor state. - Tested by running locally with
docker runand pointing the desktop app at it.
Gate: A developer can run a full chat session against the containerized engine, locally, with zero functional regressions vs the desktop sidecar.
Milestone C2 — One cluster, one agent (3-4 weeks)
Stand up a Kubernetes cluster on GKE. Deploy the engine container as a regular pod. Connect to it from a dev machine. No multi-tenant. No Knative. Just: cloud-hosted engine working.
- GKE cluster, 2-3 nodes, no special runtime.
- One agent pod with a persistent volume.
- Ingress (load balancer + cert-manager TLS).
- Desktop app updated to point at the cloud engine via the existing remote-engine OAuth flow.
Gate: Engineer logs into the cloud engine from the desktop app, runs an agent, sessions persist across pod restart.
Milestone C3 — Kata + Firecracker (3 weeks)
Switch the agent pod from regular runc to Kata Containers with Firecracker. Measure cold start, throughput, and any feature regressions.
- Install Kata operator on the cluster.
- Provision node pool with nested virtualization (or bare metal).
- Add
runtimeClassName: kata-fcto the agent pod manifest. - Benchmark cold start, message latency, memory overhead.
Gate: Cold start ≤ 1 second end-to-end. Message latency within 50 ms of pre-Kata baseline. No feature breaks.
Milestone C4 — Knative scale-to-zero (2 weeks)
Replace the always-on agent Deployment with a Knative Service. Watch it sleep when idle and wake when a message arrives.
- Install Knative Serving.
- Knative
Serviceper agent (initially: one test agent). - Tune scale-to-zero grace period (default ~30s, probably keep).
- Verify the persistent volume re-attaches correctly on wake.
Gate: Agent pod is gone after idle, persistent volume survives, first message after idle wakes it in ≤ 1 second with no data loss.
Milestone C5 — Control plane v0 (4-6 weeks)
Write the routing service that sits between the frontend and the agent pods. v0 is single tenant (one workspace, one user). Multi-tenant comes in C6.
- Rust axum service. JWT verification (Supabase). Postgres for users + agents + permissions. Redis for routing cache + sessions.
- HTTP routes: list agents, send message, stream response. WebSocket for live chat.
- Routes messages to the right Knative URL.
- Audit log for every request.
Gate: Frontend → control plane → agent pod works end-to-end for one workspace. Sessions survive control plane pod restart.
Milestone C6 — Multi-tenancy + permissions (3-4 weeks)
Add workspaces, per-workspace namespaces, and per-agent permissions. Make sure Acme can never see Globex.
- Workspace provisioning: when an admin creates a workspace, the control plane creates a K8s namespace + default NetworkPolicy + Postgres rows.
- User invite flow.
- Admin UI: grant / revoke agent access per user.
- Cilium installed and configured. Cross-namespace traffic denied by default.
- Penetration test: try to make Acme reach Globex. Should fail at every layer.
Gate: Two test workspaces can run on the same cluster, share no data, and a deliberate attempt by one to reach the other is blocked by the control plane, by Cilium, and by Firecracker independently.
Milestone C7 — Web app (4-8 weeks, parallel)
Build app.houston.ai as a webapp using the existing
@houston-ai/* React packages. Or: defer this and
ship "desktop app + cloud engine" for v1.
- Next.js or Vite frontend.
- Reuses chat, board, layout packages.
- Login via Supabase. Workspace switcher. Agent list. Chat UI.
- Admin UI for managing users + permissions.
Gate: Non-technical user can sign up, get invited to a workspace, see their agents, chat with one, all without installing anything.
Milestone C8 — Production hardening (4-6 weeks)
Everything works. Now make it stay working at 3am on a Sunday with paying customers.
- Monitoring: Prometheus + Grafana. Alerts on every SLO.
- Sentry wired into control plane and agent pods.
- Backups: Postgres point-in-time recovery, volume snapshots, S3 exports.
- Runbooks for: pod stuck, control plane down, Postgres failover, region outage.
- Load test: 1,000 concurrent agents, measure cost.
- Security review: external pentest.
Gate: Stack survives a load test of 10x expected v1 traffic. Pentest finds no high-severity issues.
Milestone C9 — First paying customer (gated by previous)
Onboard a friendly design-partner. Get them through real usage. Fix what breaks.
- White-glove onboarding.
- Slack channel with their team.
- Daily check-ins for the first 2 weeks.
Gate: Customer renews / signs annual contract. We've earned the right to scale to ten more like them.
The size totals
| Milestone | Size | Cumulative |
|---|---|---|
| C1 Container the engine | 2 weeks | 2 weeks |
| C2 One cluster, one agent | 3-4 weeks | 5-6 weeks |
| C3 Kata + Firecracker | 3 weeks | 8-9 weeks |
| C4 Knative scale-to-zero | 2 weeks | 10-11 weeks |
| C5 Control plane v0 | 4-6 weeks | 14-17 weeks |
| C6 Multi-tenancy | 3-4 weeks | 17-21 weeks |
| C7 Web app (parallel) | 4-8 weeks | same window |
| C8 Production hardening | 4-6 weeks | 21-27 weeks |
| C9 First paying customer | 2-4 weeks | 23-31 weeks |
Roughly 6 to 8 months from start to first paying enterprise customer, assuming 1-2 engineers dedicated. Faster with more people, but the gates aren't always parallelizable (you can't tune Kata before the engine is containerized).
What we don't build in v1
Listed so we don't forget we said no.
- Multi-engine support. Pluggable runtime adapter. Defer until a customer asks for Hermes or OpenClaw specifically.
- Multi-region. Single region, single cluster for v1. Add EU region when EU customers sign.
- Self-hosted on-prem. Separate product story. We ship the SaaS first.
- BYO Kubernetes. "Run Houston on our cluster." Maybe never. Definitely not v1.
- Custom node pools per tenant. Enterprise tier feature. Add once an enterprise demands it and is willing to pay the premium.
- Per-region data residency. Compliance feature, separate workstream.
The biggest risks
Kata cold start is too slow in practice. Mitigation: keep one warm pod per "popular" agent. If still too slow, fall back to gVisor for less paranoid tenants and reserve Kata for compliance-sensitive ones.
Persistent volume attach is slow or flaky. Mitigation: snapshot-based fast restore from object storage. Topology-aware scheduling. Tested as part of C4 gate.
Per-agent pod count explodes. 10,000 customers × 5 agents = 50,000 K8s services. Knative handles a lot but not infinite. Mitigation: shard across multiple clusters by tenant once we hit ~10k active services per cluster.
Engine bugs in production are hard to debug. Logs from a Firecracker microVM aren't as easy to grab as from a process on a laptop. Mitigation: structured logging streamed to Loki, per-agent log retention, "support session" mode that lets us shell into a pod with consent.
What this guide does not cover (and where to look)
- What the engine actually does inside the pod → see the engine-design guide.
- Marketplace, agent publishing → separate product spec, not written yet.
- Mobile PWA in the cloud world → see
docs/mobile-architecture.mdanddocs/relay-operations.md. - Billing model + pricing → business doc, not engineering.
Milestone C1. Two weeks to containerize the engine, locally testable end-to-end. Doesn't need a cluster, doesn't need approvals, just a Dockerfile and a developer. Everything else compounds on that working artifact.