Idle agents pay zero.
The economics of Houston Cloud only work if a sleeping agent costs nothing. Most users have a Sales bot they touch twice a day and a recruiter bot they touch once a week. If those pods run 24 hours a day, our bill scales linearly with seats and we lose money on every customer. Knative is the trick that makes idle agents free.
The economic problem
Imagine an enterprise with 100 employees and 10 agents per employee. That's 1,000 agents. If we charge $20 per seat per month and each agent's pod costs us $5 a month to run idle, our gross margin is negative before the agent does any work.
We need: pods that exist only when a human is actually talking to them. Pods that take less than a second to wake up. Pods that automatically die when nobody's around.
That is exactly what Knative does.
Knative, in one paragraph
Knative is a layer on top of Kubernetes that adds three superpowers:
"scale to zero," "scale on demand," and "scale based on traffic."
You hand Knative a container image and say "this is a service."
Knative figures out the rest. When traffic arrives, it boots
pods. When traffic stops, it kills them. When traffic spikes,
it boots more. You never run kubectl scale by hand.
Built by Google. Same team that made Kubernetes. Used in production by tons of teams. It's the open source guts of Google Cloud Run.
How it works for one agent
agent-hr-acme."Cold start, the only real downside
First message after idle takes longer because the pod has to boot. With Firecracker we're talking ~500 ms total, which is shorter than the user's expectation of "agent is thinking." Subsequent messages have zero overhead.
For agents the user expects to be instant (a daily-use agent), we can configure Knative to keep one pod always warm. Costs a tiny bit, hides the cold start. The default for any new agent is "scale to zero" because most agents are touched rarely.
What about KEDA?
KEDA is the other big "scale based on events" project. Knative scales based on HTTP traffic. KEDA scales based on anything (queue depth, message count, custom metrics). For Houston, HTTP/WebSocket traffic is the signal that matters, so Knative fits more naturally. We can add KEDA later if we need to scale based on, say, "number of inbound Slack messages waiting for an agent."
Why this matters for the pitch
"A customer with 10,000 agents pays only for the ones actively in a conversation" is a real cost story, not marketing. It's what makes per agent isolation affordable. Without scale to zero, pod per agent is expensive theater. With scale to zero, it's a structural cost advantage we can't easily lose.
Kubernetes (the platform) + Kata Containers (the runtime) + Firecracker (the VM) + Knative (the autoscaler) is the full hand. Each one fixes a problem the others can't. Together they give us "pod per agent, isolated by VM, billed only when active." Nothing else gets us all three at once.
Knative Serving from the official YAML. Configure it to use Kata's RuntimeClass for our agent services. Each agent becomes a Service resource with a unique URL. Scale-to-zero idle time defaults to ~30 seconds; can tune per agent. The control plane points at the agent's Knative URL when routing messages.