Build the walls before someone needs them.

By default, every pod in a Kubernetes cluster can talk to every other pod. That's an open building with no doors. For multi tenant Houston, we add doors. Lots of them. This chapter is about who is allowed to talk to whom.

The default is dangerous

Out of the box, K8s networking is one big party. Every pod can reach every other pod by IP. The thinking is "you decide your rules; we give you the primitives." Which is fine for one team, one product. Dangerous for "Acme's agents and Globex's agents on the same cluster."

Acme should not be able to reach Globex's agents even by accident. Even with a bug in the control plane that sent a request to the wrong place. The network itself should refuse.

NetworkPolicy, the K8s firewall

Kubernetes has a resource called NetworkPolicy. It's a tiny YAML file that says "pods labeled X can talk to pods labeled Y on these ports. Nothing else allowed."

Our shape:

Cilium, the upgraded firewall

Raw Kubernetes NetworkPolicy is fine but limited. You can't say "allow this agent to call api.slack.com but not the rest of the internet."

Cilium is a replacement for the default K8s networking plugin. It uses eBPF (a Linux feature that lets programs run inside the kernel safely) to enforce richer rules:

Cilium is the boring industry standard for serious multi tenant K8s. GKE has a managed Cilium option. We'd turn it on at cluster creation.

The alternative is Calico. Older, also good, simpler. If Cilium feels like overkill in year one, Calico is the fallback.

Per workspace namespaces

The unit of tenant isolation is a Kubernetes namespace.

ws-acme-corp

  • HR agent pod
  • Sales agent pod
  • Recruiter agent pod
  • Agent volumes
  • Agent secrets
NETWORK
POLICY

ws-globex

  • Marketing agent pod
  • Engineering agent pod
  • Agent volumes
  • Agent secrets

Inbound traffic (how users reach us)

Users connect to app.houston.ai over HTTPS. That hits a load balancer (managed by GKE), which routes to the control plane. The control plane is the only door from the outside world into the cluster.

Agent pods are never reachable directly from the internet. They live inside the cluster, behind Knative, behind the control plane. There is no public URL like agent-hr-acme.houston.ai. Even if there were, NetworkPolicy would block external traffic to it.

Outbound traffic (how agents reach tools)

Agents need to call out: Claude API, OpenAI API, Composio (which calls Slack, Gmail, etc on the agent's behalf). This is a hole we have to leave open.

Default: allow outbound on port 443 (HTTPS) only, with an allowlist of hostnames. That way an agent can call api.anthropic.com but not, say, exfiltrate to random-attacker.com.

An enterprise customer can tighten this further. "Our agents may only call Anthropic, Slack, and Gmail. Block everything else." Cilium policy, two lines of YAML.

The dedicated node pool option

Most workspaces share nodes. Acme's HR agent pod might run on the same physical machine as Globex's marketing agent pod. The Firecracker wall keeps them apart (Chapter 3).

For an enterprise customer that doesn't like that even with Firecracker, we can offer a dedicated node pool: "Acme's agents only schedule onto these specific machines. No one else's agents will ever land there." Costs more, sells easier to security teams. Trivial to set up with K8s node selectors and taints.

The mental model

Where the agents' permission story ends

Network + Firecracker + namespaces handle the platform isolation. Who can talk to which agent is enforced at the control plane (auth + RBAC). Network policy stops cross-tenant bugs from becoming cross-tenant breaches. The two layers fail safely in different ways.

Concrete setup

GKE with Cilium dataplane enabled at cluster creation. NetworkPolicies templated by the control plane when provisioning a new workspace (the namespace + default-deny policy + allow-control-plane policy are created together). Service mesh (Istio/Linkerd) not required for v1 — Cilium does enough. Add a mesh only if we need fancy traffic shaping or zero-trust mTLS between every pod, which is overkill at our scale.