Build order. Honest sizing. Real gates.

The architecture is settled. The question is sequence. Some pieces unlock others; some are independent. This is the order we build, the gate each milestone has to pass before the next starts, and the rough time it takes.

How to read this

Every milestone has a goal, a list of deliverables, a gate, and a size. Sizes are honest engineering estimates, not commitments to investors. Gates are the conditions that say "we're done with this, on to the next." If a gate fails, we don't skip, we fix.

Milestone C1 — Container the engine (2 weeks)

Take the existing houston-engine binary and put it in a Docker image. Verify it boots, serves the HTTP API, and can run a Claude Code session inside the container exactly like it does on the desktop.

Gate: A developer can run a full chat session against the containerized engine, locally, with zero functional regressions vs the desktop sidecar.

Milestone C2 — One cluster, one agent (3-4 weeks)

Stand up a Kubernetes cluster on GKE. Deploy the engine container as a regular pod. Connect to it from a dev machine. No multi-tenant. No Knative. Just: cloud-hosted engine working.

Gate: Engineer logs into the cloud engine from the desktop app, runs an agent, sessions persist across pod restart.

Milestone C3 — Kata + Firecracker (3 weeks)

Switch the agent pod from regular runc to Kata Containers with Firecracker. Measure cold start, throughput, and any feature regressions.

Gate: Cold start ≤ 1 second end-to-end. Message latency within 50 ms of pre-Kata baseline. No feature breaks.

Milestone C4 — Knative scale-to-zero (2 weeks)

Replace the always-on agent Deployment with a Knative Service. Watch it sleep when idle and wake when a message arrives.

Gate: Agent pod is gone after idle, persistent volume survives, first message after idle wakes it in ≤ 1 second with no data loss.

Milestone C5 — Control plane v0 (4-6 weeks)

Write the routing service that sits between the frontend and the agent pods. v0 is single tenant (one workspace, one user). Multi-tenant comes in C6.

Gate: Frontend → control plane → agent pod works end-to-end for one workspace. Sessions survive control plane pod restart.

Milestone C6 — Multi-tenancy + permissions (3-4 weeks)

Add workspaces, per-workspace namespaces, and per-agent permissions. Make sure Acme can never see Globex.

Gate: Two test workspaces can run on the same cluster, share no data, and a deliberate attempt by one to reach the other is blocked by the control plane, by Cilium, and by Firecracker independently.

Milestone C7 — Web app (4-8 weeks, parallel)

Build app.houston.ai as a webapp using the existing @houston-ai/* React packages. Or: defer this and ship "desktop app + cloud engine" for v1.

Gate: Non-technical user can sign up, get invited to a workspace, see their agents, chat with one, all without installing anything.

Milestone C8 — Production hardening (4-6 weeks)

Everything works. Now make it stay working at 3am on a Sunday with paying customers.

Gate: Stack survives a load test of 10x expected v1 traffic. Pentest finds no high-severity issues.

Milestone C9 — First paying customer (gated by previous)

Onboard a friendly design-partner. Get them through real usage. Fix what breaks.

Gate: Customer renews / signs annual contract. We've earned the right to scale to ten more like them.

The size totals

MilestoneSizeCumulative
C1 Container the engine2 weeks2 weeks
C2 One cluster, one agent3-4 weeks5-6 weeks
C3 Kata + Firecracker3 weeks8-9 weeks
C4 Knative scale-to-zero2 weeks10-11 weeks
C5 Control plane v04-6 weeks14-17 weeks
C6 Multi-tenancy3-4 weeks17-21 weeks
C7 Web app (parallel)4-8 weekssame window
C8 Production hardening4-6 weeks21-27 weeks
C9 First paying customer2-4 weeks23-31 weeks

Roughly 6 to 8 months from start to first paying enterprise customer, assuming 1-2 engineers dedicated. Faster with more people, but the gates aren't always parallelizable (you can't tune Kata before the engine is containerized).

What we don't build in v1

Listed so we don't forget we said no.

The biggest risks

Risk 1

Kata cold start is too slow in practice. Mitigation: keep one warm pod per "popular" agent. If still too slow, fall back to gVisor for less paranoid tenants and reserve Kata for compliance-sensitive ones.

Risk 2

Persistent volume attach is slow or flaky. Mitigation: snapshot-based fast restore from object storage. Topology-aware scheduling. Tested as part of C4 gate.

Risk 3

Per-agent pod count explodes. 10,000 customers × 5 agents = 50,000 K8s services. Knative handles a lot but not infinite. Mitigation: shard across multiple clusters by tenant once we hit ~10k active services per cluster.

Risk 4

Engine bugs in production are hard to debug. Logs from a Firecracker microVM aren't as easy to grab as from a process on a laptop. Mitigation: structured logging streamed to Loki, per-agent log retention, "support session" mode that lets us shell into a pod with consent.

What this guide does not cover (and where to look)

Where to start tomorrow

Milestone C1. Two weeks to containerize the engine, locally testable end-to-end. Doesn't need a cluster, doesn't need approvals, just a Dockerfile and a developer. Everything else compounds on that working artifact.