The roadmap.
This roadmap separates reliability, isolation, runtime, triggers, and Cloud. It does not hide risky work inside optimistic dates. Every big promise has a gate and a test contract.
The shape
Track Gate / estimate
M0 Plan cleanup 1 week
M1a Durable turns 4-6 weeks after schema lock
M1b Detached turn worker 3-4 weeks if "never die" is a product promise
M2 Scopes + credentials 3-5 weeks, can overlap M1 with separate owner
M3a Runtime spike 1-2 weeks before any M3 date is promised
M3b Runtime build 3-4 months only if spike passes
M4 Inbound triggers after M1a, before Cloud
M5 Cloud private beta 3-5 months with 2 engineers, separate RFC
Apple entitlement start immediately, async weeks to months
M0 — Make the plan true (1 week)
- Update design docs to match current repo facts.
- Name shipped, partial, and proposed work separately.
- Add acceptance tests for current weak spots before refactors: session start, stream persistence, WS reconnect, routine fire, auth token use.
- Decide exact schema for
turns,turn_stream,events_outbox, andtrigger_runs. - Decide whether
chat_feedmigrates hard or gets one-release read fallback. Recommendation: canonicalturn_streamwrites, one-release read fallback only.
M1a — Durable turn acceptance and replay (4-6 weeks)
Promise after M1a: accepted turns are recorded before side effects. Committed stream chunks replay after refresh or network reconnect. Stale work surfaces retry/cancel instead of disappearing.
- Add checked SQLite migrations. Stop swallowing reliability migration failures with
.ok(). - Add
turns,turn_stream,events_outbox, andtrigger_runs. - Add client-supplied
turnIdto session start request and idempotent server handling. - Insert the turn row before spawning Claude, Codex, Gemini, or any provider.
- Write stream chunks before broadcasting them.
- Add WS replay with
sinceSeqand authorization checks on replay. - Add boot sweep: queued turns resume, stale running turns become interrupted unless a worker heartbeat owns them.
- Add routine missed-fire policy through
trigger_runs. - Add bounded intake for real trigger paths, not only the old unbounded event queue.
This is not "conversations never die." It is "Houston never lies about accepted work, committed chunks, or retry state."
M1b — Detached turn worker (3-4 weeks)
Promise after M1b: app close, engine restart, and engine bounce do not kill an in-flight provider turn.
- Add
houston-turn-workeras a small detached binary. - Worker owns provider subprocess and writes stream chunks directly to SQLite.
- Engine attaches to worker by turn id after restart.
- Worker heartbeat updates
last_heartbeat_at. - Cancel is durable: set
cancel_requested_at, worker cooperatively stops provider process. - Crash tests kill app, engine, worker, and provider process separately and assert visible state.
If the product says "conversations never die," M1b is mandatory. If not, make the promise weaker and ship M1a first.
M2 — Scopes and per-agent credentials (3-5 weeks)
Promise after M2: engine API calls are scoped, and provider credentials are separated per agent. This still is not kernel filesystem isolation.
- Add
principalsandprincipal_tokens. - Migrate today's device bearer into
local:localwith full local scope. - Add
require_scopemiddleware to every route touching agent data. - Add route-audit tests so new path-touching routes cannot skip scope checks.
- Override provider homes per agent on native hosts: Claude, Codex, Gemini, Composio.
- Add strict per-agent login mode for desktop workspaces that represent separate clients.
- Keep brokered convenience as a solo-desktop default, but label the weaker isolation honestly.
M3a — Runtime spike (1-2 weeks)
Gate: do this before promising Linux runtime dates.
- Bundle a minimal Linux guest image with
houston-engine. - Boot through macOS
Virtualization.frameworkon Apple Silicon and Intel if supported. - Measure cold start target: 2s Apple Silicon, 3s Intel, or write down why target changes.
- Measure resident memory target: 300MB or less idle.
- Measure battery: 4-hour idle session, target 10% or less extra drain versus native.
- Test file import/download latency with realistic workspace size.
- Test entitlement path on signed Developer ID build.
- Test WSL2 install/start on clean Windows and corp-locked Windows.
Fail gate means do not build M3b yet. Keep native engine and ship M1/M2 wins.
M3b — Runtime build (3-4 months, only if spike passes)
Promise after M3b: per-agent Linux users provide real filesystem isolation on supported machines.
- Runtime supervisor under Tauri adapter code.
- macOS VZ wrapper, guest image build, update path, logs, recovery.
- Windows WSL2 detect/install/register/start, with clear unsupported state for blocked machines.
- Linux passthrough path.
- Workspace migration into runtime storage with checksum, rollback marker, and host copy preserved until verified.
- Import, download, export, reveal, and logs flows.
- Per-agent
useradd, folder ownership, 0700 modes. - Provider CLI spawn as
hou_<agent>. - Full permission-denied test: Sales agent cannot read HR folder.
M4 — Inbound triggers (after M1a)
Promise after M4: external systems can safely start agent turns without opening a memory DoS or bypassing scopes.
- Add
POST /v1/hooks/<route_token>. - Add hook token table, rotation, revoke, last-used, and scope.
- Add provider signature verification for each supported webhook source.
- Insert
trigger_runsbefore dispatch. - Use bounded intake and per-source rate limits.
- Add Slack-style inbound adapter only when route, auth, and dispatch are done.
- Add replay and audit UI for accepted, rejected, skipped, and failed triggers.
M5 — Cloud private beta (3-5 months, 2 engineers)
Promise after M5: Cloud can host private-beta customers with provisioning, wake, backup, restore, billing, support, and observability.
- Write Cloud RFC before platform choice. Fly Machines are a candidate, not a done decision.
- Build control plane: signup, billing, provisioning, admin, support, observability.
- Add per-customer or per-workspace runtime provisioning.
- Add idle exit and wake path after M1b exists.
- Add hosted scheduler for routines.
- Add quota controls for provider spend, trigger volume, storage, and runtime hours.
- Backup every workspace. Run restore drills before charging users.
- Define region, data residency, legal, and vendor-exit story.
Parallel track: Apple entitlement paperwork
Start immediately. Do not wait for M3. Using
Virtualization.framework in a Developer ID app
requires the com.apple.security.virtualization
entitlement, which Apple grants by request. Timeline: a few weeks
if clean, longer if Apple has questions. Filing late blocks the
entire M3 release.
Action: open a ticket at developer.apple.com (Code Signing &
Entitlements). Describe Houston as a desktop app that runs
user-owned agent workloads inside a Linux guest for isolation.
Reference VZ.framework. Wait. Then add the entitlement to
app/src-tauri/entitlements.plist the moment it's
granted.
M1: canonical turn_stream writes with one-release read fallback, or hard migration only?
M1b: do we make "never die" a product promise now, or delay it and say "retryable" after engine crash?
M2: brokered desktop default plus strict mode, or strict per-agent by default everywhere?
M3: what happens when Windows virtualization is disabled by BIOS or corporate policy?
M5: which Cloud platform, backup objective, region promise, support SLA, and vendor-exit plan?
What "shipped" means by milestone
- After M1a: accepted turns, committed chunks, reconnect replay, and retry state are durable.
- After M1b: app close and engine bounce do not kill in-flight provider turns.
- After M2: engine API scopes and per-agent provider homes exist.
- After M3b: supported desktops get kernel-backed per-agent filesystem isolation.
- After M4: public hooks and Slack-style inbound channels use bounded durable trigger dispatch.
- After M5: Cloud is private-beta ready, with backups and restore already tested.
Minimum serious schedule: M1a and M2 can overlap with two engineers. M1b, M3b, and M5 each need focused ownership. One engineer stretches the plan linearly and should not promise Cloud plus runtime in the same half-year.