Where the engine lives.
Today the engine runs natively on Mac, Windows, and Linux. Three builds. Three sandboxing stories. The proposed next step is a Linux-runtime engine on Mac and Windows, native Linux everywhere else. This is not a committed migration yet. It is a high-leverage isolation bet that must pass hard measurements before we move user data into it.
Do not schedule this as build work until the spike proves startup, memory, battery, entitlement, WSL install, migration, import/download, and support-story risks. If any one fails, redesign before building.
Why we're moving. Three concrete reasons.
1. Per-client isolation is a real product need
Imagine you're a consultant. You build one Houston agent per client. ClientA agent, ClientB agent. You don't want ClientA's agent to read ClientB's contracts, files, or chat history. Today, nothing stops it. The Claude or Codex subprocess runs as you, with your full home directory, your SSH keys, your everything.
A clever (or jailbroken) prompt asking ClientA's agent to "go read
everything in ~/.houston/workspaces/MyConsulting/ClientB/"
will work, because there is no wall. The agent has a shell. It can
cat, cd, ls, find.
Read Chapter 7 for the full threat model. The short version is that
the only way to actually stop this is a kernel-enforced wall, and
the only practical way to get that on every OS is per-agent Linux
users inside a Linux runtime.
2. Real users are cloud-first, not Finder-first
Houston's target user is non-technical. Their data is in Drive,
Slack, Notion, Gmail. They don't keep a tidy
~/Documents/Clients/. They authenticate to a SaaS via
OAuth and the agent does the work there.
"The agent can browse my whole laptop" is not a feature for these users. It's an attack surface. Moving the agent into a Linux runtime with no access to the host filesystem flips this from a risk to a property: "your agent cannot see anything outside the runtime, by design."
For users who DO have local files, the desktop app exposes Import (one-time copy in) and Download (export out). Both first-class affordances. See "File flow" below.
3. The Windows tooling tax is real and growing
Composio doesn't run on Windows without our patches. We maintain
gethouston/composio as a fork because upstream closed
the Windows issue with "use WSL." Every vanguard CLI tool we'll want
to bundle next (Stripe Link CLI, future MCP servers, whatever) ships
Linux and Mac first, Windows months later if at all.
A Linux runtime on Windows means we can ship the Linux build of most tools. That reduces fork pressure and Windows-specific patching. It does not remove the need to test every bundled CLI on Windows hosts.
The target: one Linux operating profile
Why not a custom hypervisor
We are NOT writing our own VM stack. Apple and Microsoft have already shipped the right primitives. Using them means:
-
macOS: Apple
Virtualization.framework. Ships with the OS since macOS 11. Boots a Linux guest in 1-3 seconds on Apple Silicon with a slim image if our measurements prove it. Apple owns the kernel-side work. We own the wrapper, image, persistence, updates, logs, and failure UX. -
Windows: WSL2. Pre-installed on Windows 11.
Usually present on Windows 11. One-command install on many
Windows 10 machines (
wsl --install), but corporate laptops may block virtualization, Store installs, or WSL distro registration. File Explorer can mount the guest filesystem at\\wsl$\Houston\without us doing anything. Microsoft owns the kernel-side work. We own the registration. - Linux: no runtime needed. The engine runs natively. Per-agent UIDs work directly.
- Cloud: candidate platforms such as Fly Machines already provide Linux microVMs. The engine runs natively inside them.
Houston does not ship a hypervisor. Houston ships a thin platform layer that asks the host's existing virtualization to run a Linux guest containing the engine binary.
File flow, migration, and rollback
If this ships, the agent folder lives inside the Linux runtime. That changes a user-data root, so it needs an idempotent migration and an escape hatch. Bytes get in and out through four paths.
- OAuth-connected SaaS (Drive, Slack, Notion, etc). The most common path by far. The agent uses its credentials to read and write directly to the SaaS. Nothing touches the host.
-
Import. User picks a file or folder via the host
OS file picker. Tauri reads it, posts the bytes to the engine over
HTTP, the engine writes them inside the agent's working
directory. Already implemented for attachments (
POST /v1/attachments/uploads). Extends to project file import. - Download. User clicks Download in the app. Engine reads the file inside the runtime, streams it to Tauri, Tauri writes it to the host's Downloads folder and reveals it.
-
Migration/export. First launch after upgrade copies
existing
~/.houston/workspacesinto the runtime, verifies checksums, leaves the host copy untouched, and records a reversible migration marker. Export produces a normal folder backup.
On Windows specifically, WSL2 also exposes the guest filesystem at
\\wsl$\Houston\ in Explorer. Users can browse it
natively if they want. That is a bonus, not a contract: the canonical
path on every OS is Import, Download, and Export. Houston must not
depend on host filesystem visibility for correctness.
What must stay native-host owned
- OS pickers and reveal-in-file-manager. Tauri host shell still owns them.
- Crash reporting and logs. Host must collect both app logs and runtime engine logs for Report bug.
- Updates. App update must update runtime image, engine binary, and bundled CLIs as one signed unit.
- Recovery. If runtime fails to boot, user sees a visible recovery screen with export/repair options.
The new pieces of code
| Piece | Where | What it does |
|---|---|---|
| Runtime supervisor | app/houston-tauri/runtime/ (new) |
Boots the platform's Linux runtime, waits for the engine banner, exposes a localhost port to the rest of the app. |
| Mac VZ wrapper | app/houston-tauri/runtime/mac.rs (new) |
Calls Virtualization.framework via Objective-C bindings (objc2 + objc2-virtualization). Loads bundled kernel + initrd + ext4 rootfs. |
| Windows WSL wrapper | app/houston-tauri/runtime/win.rs (new) |
Detects WSL2, runs wsl --install on demand, registers a "Houston" distro, starts the engine inside it. |
| Linux passthrough | app/houston-tauri/runtime/linux.rs (new, thin) |
No runtime. Spawns the engine like today. |
| Linux guest image | runtime-image/ (new) |
Buildroot or Alpine-based minimal Linux. Engine binary, sqlite, busybox, OpenSSH disabled, no shell login. Built in CI, bundled in the Mac .app and the Windows MSI. |
| Download / Import routes | engine/houston-engine-server/src/routes/agent_files.rs |
Import-bytes exists. Project read is text-only today. Runtime mode needs binary download, folder export, checksum verification, and host-side save/reveal. |
Apple entitlement paperwork (start now, parallel track)
Using Virtualization.framework in a Developer ID app
requires the
com.apple.security.virtualization entitlement. It is
restricted. You declare it in the entitlements plist, then file a
request with Apple Developer support describing the use case
(running a Linux guest for sandboxed agent workloads). Apple grants
a provisioning profile that carries the entitlement, or approves
your existing signing cert to include it.
Timeline is usually a few weeks if the request is clean, longer if Apple has follow-up questions. Open the ticket early, while M3 is still being validated, not when the code is written. Without it, the runtime startup call returns an error. That error must be visible and reportable, never a silent boot failure.
Numbers we need to confirm before committing
The whole milestone is gated on three measurements. Spike all three before locking M3 dates.
- Cold start budget: target ≤ 2 seconds on Apple Silicon, ≤ 3 seconds on Intel Mac, ≤ 1 second on Windows with WSL2 warm. If the real number is 5+ seconds, the UX changes and we add a "Houston is starting" screen.
- Memory: target ≤ 300MB resident for the Linux guest including the engine. Docker Desktop is the cautionary tale (1-4GB). A slim image with no daemons should land in 200-400MB.
- Battery: target ≤ 10% extra drain on Mac vs today's native engine, measured over a 4-hour idle session. Apple Silicon makes this plausible. Worth measuring before users complain.
Open question: Windows machines without virtualization
Some corporate laptops have virtualization disabled in BIOS. Some older CPUs lack the required extensions. Some IT policies block WSL registration. Houston needs an install-time check and an upgrade-time check before moving data. Recommended default for beta: keep native engine available as a visible fallback until runtime success rates are measured. Remove fallback only after data says support cost is lower than divergence cost.
Desktop and Cloud move toward one Linux operating profile. Per-agent kernel isolation becomes possible on supported machines. Windows-specific tool patches shrink. The cost is a runtime supervisor we do not fully own, an entitlement we have to win, and Import/Download/Export flows we have to build first-class.
Today's engine spawn: app/src-tauri/src/engine_supervisor.rs. New runtime supervisor will live in app/houston-tauri/runtime/. Mac VZ bindings: objc2-virtualization. WSL2 bindings: wslapi crate or direct wsl.exe invocations. Linux guest image build: new runtime-image/ dir using Buildroot or Alpine. Engine bin target: unchanged (cargo build --release -p houston-engine-server --target x86_64-unknown-linux-musl or aarch64-unknown-linux-musl).