The agent's brain lives on a disk that outlives the pod.

When Knative kills an idle agent pod, the agent's notes, memory, and OAuth tokens can't die with it. The pod gets thrown away. The disk doesn't. This chapter is about that disk and the handful of decisions around it.

The problem

A Houston agent stores everything in a folder called .houston/. Inside: chat history, memory, learned skills, OAuth tokens for connected tools, files the user has shared. On the desktop, that folder lives in ~/.houston/workspaces/... on the user's hard drive. It survives reboots, app crashes, anything short of the user deleting it.

In the cloud, the agent's pod is ephemeral. Knative kills it after a few minutes of idle. If the .houston/ folder lived inside the pod, every nap would amnesia the agent. Bad. So the folder has to live somewhere outside the pod.

The fix: a persistent volume per agent

Kubernetes calls disks "persistent volumes." A persistent volume is a chunk of cloud storage that gets attached to a pod when the pod boots and detached when the pod dies. The data on it survives.

Every agent gets one. ~1 GB to start, grows as needed. Mounted inside the pod at /data/.houston/. The engine doesn't know it's running in the cloud; it sees a normal folder.

When the agent is awake
Agent pod (Knative)
↓ mounted at
/data/.houston/
↓ stored on
PersistentVolume (cloud disk)
When the agent is asleep
Agent pod (deleted)
/data/.houston/
↓ still here
PersistentVolume (cloud disk)
Pod dies, disk lives. Wake up, mount disk, exactly where you left off.

How big is the disk?

Start at 1 GB per agent. Costs about 10 cents per month on GCP. Auto-grow when 80% full. We're not optimizing storage cost ever — even 10,000 agents at 1 GB is $1,000 a month total. Real money is compute, not disk.

What about big files?

If an agent handles huge files (videos, datasets), the persistent volume is the wrong place. Two reasons: it's attached only when the pod is alive, and it's a more expensive kind of storage than necessary.

For big files, the agent writes to object storage (S3 on AWS, GCS on Google Cloud). Object storage is dirt cheap, designed for huge blobs, and accessible from anywhere. The agent keeps only a reference (a URL) on its persistent volume.

Same pattern as your desktop app having a shortcut to a video on Dropbox. The shortcut is small, the video lives somewhere cheap and shared.

Backups

Two layers:

Restore process: if an agent's volume gets corrupted, we mount the last snapshot, copy in the latest .houston/ from S3, the agent's next message wakes the pod with restored state. User notices nothing.

Sticky scheduling

There's a subtle issue: a persistent volume is usually tied to one cloud region or even one zone. If we have nodes in us-east-1a and us-east-1b, an agent's volume that was created in 1a can only mount to pods in 1a.

Kubernetes handles this automatically with topology aware scheduling. When the pod boots, K8s asks "where does your volume live?" and only schedules the pod onto matching nodes. We don't have to think about it once it's set up.

The cold start optimization (for later)

Attaching a persistent volume on cold start adds a few hundred milliseconds. For 99% of agents this is fine. If we ever need sub-100ms cold start for some always-on agent, we'd switch to image preloaded with state or state sync from object storage instead of mount. Not v1.

What this gives us

The one footgun

Persistent volumes don't shrink. If an agent writes 5 GB of crap once and then cleans up, you're paying for 5 GB forever (until you migrate the volume). For Houston this probably doesn't matter — agent data should stay small if we're disciplined about big-file offload to object storage. Worth a periodic "agent volume size audit" cron once we're live.

Concrete setup

StorageClass standard-ssd-zonal on GKE (cheap SSD per AZ). PersistentVolumeClaim per agent, name pattern pvc-agent-{agent-id}. Created by the control plane when the agent is provisioned. Snapshots scheduled via Velero or native VolumeSnapshot controllers. Object storage: GCS bucket per workspace for tenant data, with lifecycle rules archiving old files to colder/cheaper tiers.