Build the walls before someone needs them.
By default, every pod in a Kubernetes cluster can talk to every other pod. That's an open building with no doors. For multi tenant Houston, we add doors. Lots of them. This chapter is about who is allowed to talk to whom.
The default is dangerous
Out of the box, K8s networking is one big party. Every pod can reach every other pod by IP. The thinking is "you decide your rules; we give you the primitives." Which is fine for one team, one product. Dangerous for "Acme's agents and Globex's agents on the same cluster."
Acme should not be able to reach Globex's agents even by accident. Even with a bug in the control plane that sent a request to the wrong place. The network itself should refuse.
NetworkPolicy, the K8s firewall
Kubernetes has a resource called NetworkPolicy.
It's a tiny YAML file that says "pods labeled X can talk to
pods labeled Y on these ports. Nothing else allowed."
Our shape:
- Default deny in every per-team namespace. Agents can't initiate connections to anything.
- Allow: agent → control plane (so the agent can stream responses back).
- Allow: agent → public internet on port 443 (so the agent can call Claude, OpenAI, Composio).
- Deny everything else. Especially: agent → other tenants' anything.
Cilium, the upgraded firewall
Raw Kubernetes NetworkPolicy is fine but limited. You can't say "allow this agent to call api.slack.com but not the rest of the internet."
Cilium is a replacement for the default K8s networking plugin. It uses eBPF (a Linux feature that lets programs run inside the kernel safely) to enforce richer rules:
- Per pod allowlists by hostname (not just IP).
- Per pod observability: which pod called which API, when.
- Encryption between pods automatically.
- Faster than the default plugin because it skips iptables.
Cilium is the boring industry standard for serious multi tenant K8s. GKE has a managed Cilium option. We'd turn it on at cluster creation.
The alternative is Calico. Older, also good, simpler. If Cilium feels like overkill in year one, Calico is the fallback.
Per workspace namespaces
The unit of tenant isolation is a Kubernetes namespace.
- One namespace per workspace. Naming pattern:
ws-acme-corp,ws-globex, etc. - All of that workspace's agent pods, persistent volumes, secrets, and configs live in their namespace.
- NetworkPolicy says: pods in
ws-acme-corpcannot reach anything inws-globexand vice versa. Period. - RBAC says: only the control plane service account can read across namespaces. Tenant admins can't even look at other namespaces.
ws-acme-corp
- HR agent pod
- Sales agent pod
- Recruiter agent pod
- Agent volumes
- Agent secrets
POLICY
ws-globex
- Marketing agent pod
- Engineering agent pod
- Agent volumes
- Agent secrets
Inbound traffic (how users reach us)
Users connect to app.houston.ai over HTTPS. That
hits a load balancer (managed by GKE), which routes to the
control plane. The control plane is the only door from the
outside world into the cluster.
Agent pods are never reachable directly from the internet. They
live inside the cluster, behind Knative, behind the control
plane. There is no public URL like agent-hr-acme.houston.ai.
Even if there were, NetworkPolicy would block external traffic
to it.
Outbound traffic (how agents reach tools)
Agents need to call out: Claude API, OpenAI API, Composio (which calls Slack, Gmail, etc on the agent's behalf). This is a hole we have to leave open.
Default: allow outbound on port 443 (HTTPS) only, with an
allowlist of hostnames. That way an agent can call
api.anthropic.com but not, say, exfiltrate to
random-attacker.com.
An enterprise customer can tighten this further. "Our agents may only call Anthropic, Slack, and Gmail. Block everything else." Cilium policy, two lines of YAML.
The dedicated node pool option
Most workspaces share nodes. Acme's HR agent pod might run on the same physical machine as Globex's marketing agent pod. The Firecracker wall keeps them apart (Chapter 3).
For an enterprise customer that doesn't like that even with Firecracker, we can offer a dedicated node pool: "Acme's agents only schedule onto these specific machines. No one else's agents will ever land there." Costs more, sells easier to security teams. Trivial to set up with K8s node selectors and taints.
The mental model
- NetworkPolicy = walls between workspaces.
- Cilium = better walls, with hostname-aware rules and visibility.
- Namespaces = labeled rooms inside the building.
- Load balancer = the front door.
- Control plane = the only path from front door to any agent.
- Dedicated node pool = own floor of the building (enterprise tier).
Network + Firecracker + namespaces handle the platform isolation. Who can talk to which agent is enforced at the control plane (auth + RBAC). Network policy stops cross-tenant bugs from becoming cross-tenant breaches. The two layers fail safely in different ways.
GKE with Cilium dataplane enabled at cluster creation. NetworkPolicies templated by the control plane when provisioning a new workspace (the namespace + default-deny policy + allow-control-plane policy are created together). Service mesh (Istio/Linkerd) not required for v1 — Cilium does enough. Add a mesh only if we need fancy traffic shaping or zero-trust mTLS between every pod, which is overkill at our scale.