The control plane is the doorman.

Every message in Houston Cloud passes through one small service called the control plane. It checks your ID, decides which agent you're allowed to talk to, wakes that agent up if it's asleep, and forwards your message. That's it. It has no opinions about what the agent does next.

The job, in three lines

  1. Authenticate the user (is this really Juan?).
  2. Authorize the request (is Juan allowed to talk to the HR agent?).
  3. Route the message (forward it to the HR agent's pod and stream the reply back).

Nothing else. The agent does the actual work. The control plane is a doorman, not a chef.

Why it has to exist

Without a control plane, the frontend would have to know:

That's a lot of trust to put in a React app. So we put a small stateless service between the frontend and the cluster. The frontend talks to one address. The control plane does the thinking.

Stateless? What does that mean?

"Stateless" means the control plane remembers nothing between requests. Every request is fresh. Every answer comes from asking the database. The control plane itself can be killed and restarted whenever, and nobody notices.

This is huge for reliability. If one copy crashes, the other 4 keep running. If we deploy a new version, we roll it out pod by pod with zero downtime. The data lives in Postgres and Redis (Chapter 6), not in the control plane.

What's inside

HTTP routes
POST /login
GET /workspaces
GET /agents
POST /chat
WS /v1/ws
Talks to
Supabase (auth)
Postgres (metadata)
Redis (sessions)
Kubernetes API
Agent pods (via Knative)
Does not talk to
Claude / OpenAI
Composio
User's tools
Anything LLM
The control plane routes. It doesn't think. The agent's pod does that.

The handoff to an agent

When a message arrives:

  1. Check the user's token (Supabase verifies it).
  2. Look up: "is this user allowed to talk to this agent?" (one Postgres query, cached).
  3. Get the agent's Knative URL (one Redis lookup, or fall back to Postgres).
  4. Open a WebSocket to that URL. Forward the message. Stream the response back to the user.

If the agent's pod is asleep, step 4 takes ~500 ms because Knative boots it. If the pod is warm, step 4 is instant. The user sees a tiny lag the first time, then nothing.

Multi tenancy in one place

Every multi-tenant decision lives in the control plane:

What language is it written in?

Honest answer: probably Rust. We already write Rust well, the houston-engine is Rust, and a low-latency router with lots of concurrent WebSockets is exactly what Rust is good at. Could also be Go (designed for this) or Node (familiar). The choice doesn't change the architecture.

Size estimate: 3 to 5 thousand lines of code. Not a huge project. Most of it is HTTP routes, auth glue, and forwarding logic.

How many copies do we run?

Three to five replicas at all times. Load balanced. Add more automatically when CPU climbs. Kubernetes handles the autoscale. Because it's stateless, adding capacity is just running more pods.

The mental model

Frontend says "I want to chat with HR." Control plane says "let me check… ok, you can. Here's HR." Control plane wires the user's WebSocket directly to HR's pod and gets out of the way. Bytes flow agent ↔ user with the control plane only watching for billing and abuse. No middleman slowing down the conversation.

Concrete tech stack

Rust (axum or actix), Tokio for async, sqlx for Postgres, fred for Redis, kube-rs for K8s API calls, jsonwebtoken for verifying Supabase JWTs. Built as a Docker image, deployed as a regular K8s Deployment (no Kata, no Knative — these are always-on services, not scaled-per-agent).