Deployment: one engine contract, three homes.
Desktop, self-hosted, and Cloud should share the same engine protocol and core behavior. They will not be literally byte-for-byte identical: desktop runtime packaging, self-host Docker/systemd, and Cloud control plane all have different operational edges.
Desktop mode
Today desktop runs a native sidecar. If Chapter 3 passes its gates, desktop can move the engine into the host's Linux runtime. The desktop shell talks over local loopback. No control plane. No Cloud dependency.
Self-host mode (Always On)
The Linux engine binary, native on a Linux VPS. Wrapped in Docker
(always-on/Dockerfile) or systemd
(always-on/houston-engine.service). The user puts a
reverse proxy in front and binds HOUSTON_BIND_ALL=1.
Shipped as scaffolding for power users; see
always-on/README.md.
Cloud mode (Houston Cloud)
Cloud is not implemented. cloud/ is placeholder/TBD
today. The likely shape is one isolated Linux runtime per customer or
workspace, with a control plane in front for signup, billing, admin,
provisioning, wake, backup, and support.
Fly Machines are a candidate, not a decision
Fly Machines fit the shape: Linux microVMs, start/stop APIs, per-region placement, and wake-on-demand. But Cloud needs an RFC before committing to a vendor. The decision must include volume behavior, backup restore drills, region guarantees, cost under idle/active load, support burden, and exit path.
The control plane
Cloud is not just an engine on a VM. We need:
- Signup and billing. Supabase Auth (already wired for desktop SSO) + Stripe for subscriptions.
- Provisioning. A small service that, on first login, creates the customer's runtime, mints their bearer token, and writes initial workspace state.
- Admin. Support tools: see a customer's recent errors, restart their machine, rotate their token.
- Observability. Sentry (already used in desktop) gets a Cloud-scoped project. Per-machine log retention via Fly's built-in log shipping or BetterStack.
- Backups and restore. Automated snapshots are not enough. Restore must be rehearsed and timed.
- Abuse and quota controls. Public triggers, model spend, and long-running agents need rate limits from day one.
- Incident operations. A person must be able to answer: which customer is down, why, since when, and what changed?
This belongs behind a separate Cloud design RFC. Do not hide it inside engine runtime work.
Storage when machines sleep across hosts
Any candidate platform has storage locality rules. On Fly, volumes are pinned to a host. If a customer runtime sleeps and wakes elsewhere, the volume needs to follow. Two paths:
- Pinned host (default). Each customer's machine has one home host. Wake is fast, no copy needed. Cost: that host becomes a single point of failure for that customer.
- Cold storage in R2/S3 (failover). A nightly snapshot of the volume into object storage. If the home host is unavailable, we restore on a new host. Cost: ~30 seconds of cold-start when failover triggers.
Do not take paying Cloud customers until backup and restore are proven. Pinned host can be the first private-beta path, but failover limits must be explicit.
Data residency
Each customer picks a region at signup. Their runtime and snapshots live there. The control plane may be global; only metadata such as account and billing should live outside the customer's region.
Backups
Nightly snapshot of the volume. Restore points: last 7 daily, last 4 weekly. Self-service restore from the customer dashboard.
Sleeping engines
For Cloud mode, engines may sleep after N idle minutes. The relay can
hold the next inbound request, call the platform start API, poll
/v1/health, then proxy through. This is only safe for
active chats after M1 durable replay and M1b detached workers exist.
Planned trigger sources from Chapter 9 should share the same durable wake-and-accept path. Routines use a central scheduler service to wake a machine, instead of the relay reacting to inbound traffic.
Pricing
Per VM-hour, or per agent-month. Either aligns with cost. The Cloud product is "rent a managed Houston runtime plus the control plane around it." Self-hosters pay nothing to Houston and run the same engine themselves. Desktop users pay nothing and run the engine on their own laptop. All three meters touch the same engine protocol and core code, with deployment-specific packaging.
The useful property is narrower and still valuable: fewer engine behavior differences across deployment targets. Runtime, packaging, networking, and operations still need platform-specific tests.
Always-on Docker recipe: always-on/Dockerfile and docker-compose.yml, builds the engine from source. Relay: houston-relay/, a Cloudflare Worker + Durable Object for mobile tunnel. Tunnel client crate: engine/houston-tunnel. cloud/ is placeholder today. Missing: desktop Linux runtime supervisor, control-plane RFC and implementation, idle-exit logic, wake API, Cloud provisioning, backups, restore drills, quota controls, and operations runbooks.