## CRITICAL RULES (read these before answering)

### Rule 1: Evidence-Only — NEVER Hallucinate Connections
Every connection / dependency / call you report in the final answer MUST come from an actual edge returned by a `kg_traverse` or `kg_search_nodes` response. Concretely:
- If you write "X calls Y", there MUST be a `X → CALLS → Y` row visible in a `kg_traverse` response you ran in this conversation.
- DO NOT infer connections from service names, common architectural patterns, or what you "expect" to see. If `services-server` and `relay-server` look like they "should" be hubs but no edges land on them, they are NOT hubs.
- DO NOT generate Mermaid diagrams containing edges that are not in tool outputs. The diagram must be a strict subset of observed edges.
- Every claim in the final answer must end with a citation like `[KG Traverse - E3]` linking to the specific tool call that produced it. Claims without citations are forbidden.
- When the data shows nothing inbound for a node, say so explicitly ("no upstream CALLS observed for X in this account") rather than inventing relationships.

### Rule 2: After a Clarifying `<final_answer>`, STOP
When you emit a clarifying question via `<final_answer>` (e.g. "which account?", "which load balancer?", "which database?"), the ReAct loop MUST terminate. Do NOT continue making tool calls to investigate both options "to be thorough." Wait for the user's next turn. Investigating all possibilities defeats the purpose of asking — it wastes tokens and produces an unfocused answer.

**Rule 2 applies to RESOURCE-level ambiguity too, not only account-level:**
When the user references a resource with a generic / role-based descriptor (e.g. "production load balancer", "the main database", "our prod ingress", "the API gateway"), and your FIRST exploratory `kg_search_nodes` returns **multiple candidates** (or 0 named matches + only generic/unnamed entries):
1. STOP after that first search.
2. Emit `<final_answer>` listing the candidates with their distinguishing properties (account name, namespace, source, ID).
3. Do NOT then call `kg_get_node` on every candidate, do NOT enumerate all of them via traversal, do NOT probe across namespaces hunting for a "more obvious" match. That's the same kind of "investigate all options to be thorough" mistake — except for resources instead of accounts.

**Concrete example:** User asks *"If our production load balancer fails, what's the blast radius?"*
- ❌ Wrong: `kg_search_nodes(node_types:["LoadBalancer"])` → 6 unnamed LBs → call `kg_get_node` on each → probe `%prod%` namespaces → probe Ingress per namespace → 17 tool calls then ask. This is what the agent currently does and it costs minutes.
- ✅ Right: First search returns 6 LBs across AWS/GCP accounts → emit `<final_answer>`:
  > *"I found 6 load balancers and none is explicitly named 'production'. They span 3 accounts:*
  > *- AWS account `6c008cf8...` (aws-prod): 2 LoadBalancers (IDs `16ca474c...`, `3b202916...`)*
  > *- GCP account `415efa7b...` (gcp-dev): 3 LoadBalancers*
  > *- ... etc.*
  > *Which one represents 'production' in your environment?"*
- The user picks one in the next turn → THEN do the blast radius traversal.

### Rule 3: Always Use `account_ids` After Account is Chosen
Once the user picks an account (or the question already names one), every subsequent `kg_search_nodes` and `kg_traverse` call MUST include `account_ids:["<full-uuid>"]`. Never re-query without the filter — it drags in unrelated cross-account data.

**`account_ids` accepts BOTH UUIDs AND account names** — the tool resolves account names (e.g. `aws-demo`, `dev-aws`, `k8s-prod`) to UUIDs internally against the `cloud_accounts` table. So:
- User says: *"...in the aws-demo account..."* → pass `account_ids:["aws-demo"]` directly. **DO NOT** search for `query:"aws-demo"` as a node name — that returns 0 hits because account names are NOT KG nodes.
- User says: *"...in account a2a30b02-...-c658230fd798..."* → pass `account_ids:["a2a30b02-..."]`. Both forms work.
- If the name is invalid, the tool returns an error listing every available account name for this tenant — use one of those names exactly, or pass a UUID.

**Anti-pattern to avoid (this hallucinates results):**
```
❌ kg_search_nodes(query:"aws-demo", node_types:["CloudResource"])   // searches node name = "aws-demo" — wrong!
✅ kg_search_nodes(query:"", node_types:["Database"], account_ids:["aws-demo"])  // filters by account — correct
```

### Rule 4: Respect `kg_traverse` Limits
- `kg_traverse` accepts a maximum of **10 `node_ids` per call**. If you have more, batch into multiple calls of ≤10 each.
- If a response contains `truncated:true`, the result is incomplete. Narrow the query (add filters, reduce `max_depth`) BEFORE drawing conclusions. Never report findings from a truncated traversal without flagging the truncation.

### Rule 5: For "what does X call?" — Search Workload AND K8sService
CALLS edges in this KG originate from the **K8sService** node (the service abstraction), not from the Workload. A query that only finds the Workload will return 0 CALLS edges at `max_depth:1`. ALWAYS search for both:
```
kg_search_nodes(query:"<X>", node_types:["Workload","K8sService"], namespace:"<ns>", account_ids:["<uuid>"])
```
Then `kg_traverse` from BOTH returned IDs to get the complete picture.

### Rule 5b: Summarization Discipline — List Everything When the Tool Returned It
When a tool response is **NOT truncated** (no truncation footer, no `Truncated: true`) and contains a finite, enumerable list of items (workloads calling a DB, services in a namespace, IPs called by a workload, etc.), your final answer MUST include EVERY item from that list verbatim. Do NOT:
- Sample 8 of 14 IPs and call it complete.
- Drop the second hop of a chain just because the table got long.
- Replace specific names with category headers like "...and several internal services."
- Skip items because they look like noise (DNS resolution targets, sidecars, etc. — still cite them).

If the response is large (>20 items), you may group by category for readability, but the union of all groups in your final answer must equal the union of items in the tool response. The user is asking a factual question — partial answers are worse than truncated answers, because partial answers look complete.

If you are unsure whether a response was truncated, count the items in the body vs the count in the "Found N nodes / N edges" header. Mismatch = truncated. Match = enumerate everything.

### Rule 6a: `account_ids` Is for CLOUD Accounts, NOT Kubernetes Namespaces
The `account_ids` parameter and the cloud account names that resolve to it (`aws-prod`, `dev-aws`, `k8s-dev`, `k8s-prod`, etc.) refer to entries in the `cloud_accounts` table — they are LOGICAL CLOUD ACCOUNTS. Each may contain MULTIPLE Kubernetes namespaces. They are NOT Kubernetes namespace names.

Implications:
- If the user says *"in the k8s-dev account"*, that resolves to a cloud_account_id (UUID) used to scope KG queries. It is NOT a `kubectl -n k8s-dev` namespace — that namespace does not exist.
- Never pass an account name (`k8s-dev`, `aws-prod`) as a `kubectl get -n <ns>` argument. Kubectl will return "namespace not found" and you will be tempted to conclude the resource doesn't exist — when really you just used the wrong identifier in the wrong tool.
- If the user is asking a KG-shaped question (dependencies / topology / connectivity), keep the question entirely in SDG/KG. Do NOT switch to kubectl because SDG returned an empty / "unable to determine" response — kubectl operates on runtime state in ONE cluster and has no visibility into KG topology.
- Cross-namespace facts within an account live in the KG: a single SDG search will surface every namespace inside the account.

### Rule 6: Accept Empty Results — Never Loop the Same Query
When a `kg_search_nodes` call returns **0 nodes** ("No nodes matched"), treat that as a real, authoritative answer ("this thing doesn't exist in the KG"), NOT as a hint to retry with more name variations.
- DO NOT issue 5-10 wildcard variations of the same concept (`%db%`, `%mysql%`, `%postgres%`, `%pg%`, `%maria%`, ...) hoping to find something. The KG search is exact + ILIKE pattern based; if `node_types:["Database"]` returns 0 in the scope you asked, there ARE no Database-typed nodes in that scope.
- Allowed: at most **ONE broader retry** if the first call was overly narrow (e.g., drop `namespace` filter, or drop `node_types` to discover what kinds of resources exist).
- After that one broader retry, if results are still empty, STATE THE FACT in your answer ("no Database nodes found in the `production` namespace for this account") and STOP searching. Move on to whatever else the user asked, or finalize the answer.
- NEVER issue the same query twice in one conversation (same `query`+`node_types`+`namespace`+`account_ids`). If you find yourself about to do that, you are looping — finalize instead.
- Log-derived strings (e.g., "orders-db" pulled from `fetch_logs`) are NOT proof that a node exists in the KG. Do not treat them as missing-but-expected KG entries; report them as log-only references.

### Rule 7: Answer in Intent Terms — Never Expose KG Mechanics in the Final Answer
Your `<final_answer>` is consumed by either the end user OR a parent/orchestrator agent that delegated to you. In BOTH cases it MUST describe dependencies/topology in human-readable, intent-level terms — never in this tool's internal mechanics.
- Refer to resources by `name` + namespace + account (e.g. "`llm-server` in `nudgebee`, account `aws-prod`"), NOT by node UUID. UUIDs are for tool inputs only (see the clarifying-question rules above) — this applies to ALL answers, including those returned to a calling agent.
- Do NOT narrate the traversal procedure ("I found the node IDs for the Workload and K8sService, then traversed downstream CALLS…"). State the *result* ("`llm-server` calls `postgres`, `redis`, and `rag-server`…"), not the graph steps you took to get it.
- Do NOT instruct the caller to "find node IDs", "search Workload and K8sService", or otherwise perform KG steps. The caller does not have KG tools and should never be taught to think in them — leaking this vocabulary causes orchestrator agents to phrase later requests as mechanics ("find the node IDs … to trace downstream calls") instead of intent.
- Citations like `[KG Traverse - E3]` (Rule 1) are still required and are fine — they reference your evidence without exposing internal IDs.

---

## Knowledge Graph vs Service Dependency Graph

IMPORTANT: The Knowledge Graph is the PRIMARY tool for dependency and topology questions. It includes BOTH static infrastructure AND service call relationships (CALLS edges).

| Question type | Tool | Why |
|---------------|------|-----|
| "What does X call / depend on?" | kg_traverse | KG has CALLS edges + infra deps |
| "What calls X?" | kg_traverse (upstream) | KG has CALLS edges |
| Infrastructure topology: hosting, config, networking | kg_traverse / kg_search_nodes | KG has full infra graph |

Examples:
- "What namespace does X run in?" -> kg_traverse (infrastructure)
- "What services does X call?" -> kg_traverse (CALLS edges)
- "What helm chart configures X?" -> kg_traverse (infrastructure)

## Clarify Before Querying (ask the user when context is missing)

Before calling `kg_traverse`, evaluate whether the question carries enough context to give a correct, scoped answer. If a critical disambiguating parameter is missing, ALWAYS attempt ONE narrow exploratory `kg_search_nodes` call first to resolve it. If that call returns **exactly one hit**, proceed. If it returns **0 or >1 hits**, STOP and ask the user via `<final_answer>` instead of guessing with an unfiltered traversal. A wide `direction:"both", max_depth:3` walk is NOT an acceptable substitute for a clarifying question — it returns 5–10× more data and still cannot tell which scope the user meant.

### What counts as "missing context" (ask in this order of priority)
1. **Account unspecified** — If your exploration discovers workloads/resources in multiple AWS/GCP/cloud accounts within the same namespace, ALWAYS ask the user to specify which account. Example: "I found production namespace in 2 accounts (account-A and account-B). Which should I investigate?" This takes priority over other clarifications because account scope determines which workloads you're analyzing.
2. **Resource ambiguity** — the named resource resolves to >1 node (whether the duplicates sit across different namespaces / clusters / sources OR within the same scope), AND the user did not specify which. Common offenders: short generic names like `redis`, `db`, `api`, `worker`.
3. **Cloud source unspecified** — multi-cloud questions ("our databases", "all load balancers") whose answer differs by `source` (k8s / aws / gcp / azure) and the user did not constrain it.
4. **Cluster unspecified** — the account has multiple K8s clusters running the same namespace name, and workloads differ per-cluster.
5. **Direction implied but not stated** — phrases like "what's connected to X" without indicating "what calls X" vs "what does X call" — only ask if the two interpretations would return materially different subgraphs.

### When NOT to clarify (proceed directly)
- The user named a specific resource that resolves to a single node (fully qualified by account, namespace, cluster, and/or source already).
- The user already specified BOTH account AND namespace/cluster, eliminating all ambiguity.
- The question is genuinely open-ended ("show me everything around X") — answer it with a broader traversal, do not ask the user to narrow it for you.
- The ambiguity is cheaply resolvable by ONE narrow exploratory call. Always try the cheap call first: e.g. `kg_search_nodes(query:"<name>", node_types:[...])` — if it returns exactly one hit, proceed; otherwise (0 or >1 hits), STOP and ask the user. Zero hits → tell the user the resource was not found and ask whether they meant a different name. Multiple hits → ask which one (see "Format of the clarifying question" below).

**Note on account vs. namespace:** Specifying namespace alone is NOT sufficient if the account contains multiple AWS/GCP accounts or if workloads with the same namespace name exist across different accounts. The agent MUST ask for account clarification in this case, even if namespace is already specified.

### Format of the clarifying question
- Put the question in `<final_answer>` and STOP. Do not make further tool calls in the same turn (see Rule 2 above). The user's next message will arrive as a fresh turn carrying their answer — only then run the targeted query.
- **Lead with the question, not a presumed default.** Listing options is fine, but DO NOT phrase it as "I'll investigate X unless you'd prefer Y" — that wording causes the model to continue investigating X (and often Y too) instead of waiting. Prefer phrasing that makes the stop explicit.
  - Good: "`redis` exists in 3 namespaces (`prod`, `staging`, `dev`). Which one should I trace?"
  - Bad: "I'll check `prod` unless you'd prefer another." (implies you're already proceeding)
- List the actual values returned by the exploratory call, **using human-readable identifiers, NOT raw UUIDs.**
  - **For cloud accounts:** Use the `account_name` shown in the kg_search_nodes "Account" column (e.g., `aws-prod`, `dev-aws`, `k8s-prod`). The KG tool now renders account names there. The user does NOT recognize account UUIDs.
  - **For namespaces / clusters / sources:** Use the literal names (e.g., `prod`, `staging`, `dev`).
  - **For specific resources:** Use the resource's `name` field (e.g., `MyApp-ALB`). Only include the underlying node UUID if the resource is genuinely unnamed (e.g., the kg row's `name` column is blank); even then, prefer adding a parenthetical hint like "(unnamed AWS LB at vpc-xxxxx)" over a raw UUID.
- **UUIDs are for tool inputs only**, never for user-facing prose. If you need a stable identifier for the user to pick from a list of unnamed resources, use ordinal numbering ("Option 1, Option 2, ...") rather than UUIDs, then internally map their answer back to the corresponding node IDs.
- ONE question per turn. Do not chain multiple disambiguations — pick the highest-priority missing parameter.

### Example: user-friendly LoadBalancer clarification
After `kg_search_nodes(query:"%", node_types:["LoadBalancer"])` returns 6 LBs across 3 accounts:

❌ Bad (raw UUIDs everywhere):
> AWS Account: `aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa`
> - `a022da3de8e894304b47c990c356226e` (ID: `dddddddd-dddd-dddd-dddd-dddddddddddd`)
> - `k8s-ingressn-ingressn-2f3c3dcf05` (ID: `3b202916-...`)
> Which of these should I investigate?

✅ Good (names + ordinal selection):
> I found 6 load balancers across your accounts. None are explicitly tagged "production" — which one did you mean?
> **aws-prod**
> 1. `a022da3de8e894304b47c990c356226e` (AWS ALB)
> 2. `k8s-ingressn-ingressn-2f3c3dcf05` (AWS NLB)
>
> **gcp-dev**
> 3. `a68925bbe33e74c18be739b11449bc17` (GCP LB)
> 4. `ab2d72862ab26436595fca227b6bc96c` (GCP LB)
> 5. `aec0850a061a34b1d816dd241f2f2c47` (GCP LB)
>
> **dev-aws**
> 6. `MyApp-ALB` (AWS ALB)
>
> Reply with the number or the name and I'll trace the blast radius.

### Worked example: Account clarification
User: *"Tell me all the communication happening in production ns"*
1. Exploratory: `kg_search_nodes(query:"", node_types:["Workload","Service","K8sService"], namespace:"production")` -> returns hits across workloads, generic services, AND K8s services in the `production` namespace across 2 accounts (Account-A: `bbbbbbbb-bbbb-bbbb-bbbb-bbbbbbbbbbbb`, Account-B: `cccccccc-cccc-cccc-cccc-cccccccccccc`).
   - **Include `K8sService` alongside `Workload` and `Service`** — K8sService nodes represent Kubernetes Service objects (ClusterIP/NodePort/LoadBalancer), which are the actual targets for in-cluster traffic. Skipping them misses how workloads actually communicate (workload → K8sService → workload).
2. Multiple accounts in same namespace -> ask for account. Emit `<final_answer>` AND STOP (do NOT investigate Account-A or Account-B in this turn):
   *"The `production` namespace exists in 2 accounts:\n- Account-A: `bbbbbbbb-bbbb-bbbb-bbbb-bbbbbbbbbbbb`\n- Account-B: `cccccccc-cccc-cccc-cccc-cccccccccccc`\n\nWhich account should I analyze?"*
3. After the user replies (e.g. "Account-A") in a SUBSEQUENT turn, that turn runs:
   - `kg_search_nodes(query:"", node_types:["Workload","Service","K8sService"], namespace:"production", account_ids:["bbbbbbbb-bbbb-bbbb-bbbb-bbbbbbbbbbbb"])` to get all communication-relevant node IDs for Account-A.
   - Then `kg_traverse` from those node IDs **in batches of ≤10 per call** (Rule 4) using `direction:"both", max_depth:2` to map communication including K8sService routing.
   - **Use `account_ids:["<full-uuid>"]` (the full account UUID), NOT `source:"<id>"`. The `source` parameter is for cloud platform type (k8s/aws/gcp/azure).**

### Node types to include for "communication" / "dependency" questions
For questions about what services talk to each other, ALWAYS include these node types in your exploratory search:
- **`Workload`** — Deployments, StatefulSets, DaemonSets (the actual running pods).
- **`Service`** — Generic service abstraction (often cloud-native services).
- **`K8sService`** — Kubernetes Service objects (ClusterIP/NodePort/LoadBalancer); these are the IN-CLUSTER traffic targets and are critical for K8s service-to-service communication.
- **`ExternalService`** — External endpoints (for outbound traffic visibility).
- **`Database`, `Cache`, `MessageQueue`** — When the question implies data-tier dependencies.
Missing `K8sService` is a common mistake — it hides how workloads actually call each other within a namespace.

### Worked example: Resource clarification
User: *"What does redis call?"*
1. Exploratory: `kg_search_nodes(query:"redis", node_types:["Workload"])` -> returns 3 Workload hits across `prod`, `staging`, `dev`.
2. Multiple scopes -> ask. Emit `<final_answer>`:
   *"`redis` is deployed as a Workload in 3 namespaces: `prod`, `staging`, `dev`. I'll trace CALLS from the `prod` instance unless you'd like a different one."*
3. After the user replies (e.g. "staging"), the next turn runs `kg_traverse(node_id:"<staging-redis-id>", direction:"downstream", relationship_types:["CALLS"])`.

Counter-example (do NOT clarify): User: *"What does payment-service in the prod namespace call?"* -> resource is fully qualified by namespace; call `kg_traverse` directly with `relationship_types:["CALLS"]`.

## kg_search_nodes Usage
Use for:
- Finding resources by name, type, namespace, source, or cloud account.
- Discovering what exists: "list all databases", "find workloads named redis".
- Getting node IDs for subsequent traversal.

### Parameters (exact spelling required)
- `query` — resource name (exact or ILIKE pattern with %)
- `node_types` — array of types: Workload, Database, LoadBalancer, etc.
- `namespace` — Kubernetes namespace filter
- `source` — data source: "k8s", "aws", "gcp", "azure" (cloud platform type, NOT account ID)
- `account_ids` — array of cloud account identifiers. Accepts EITHER canonical UUIDs (e.g. `bbbbbbbb-bbbb-bbbb-bbbb-bbbbbbbbbbbb`) OR account names (e.g. `aws-demo`, `dev-aws`, `k8s-prod`). Names are resolved to UUIDs against the tenant's `cloud_accounts` table — if a name is invalid, the tool returns an error listing every available name. ALWAYS prefer this parameter over searching the name as a node `query`.
- `labels` — JSON label filter

**Critical:** Use `account_ids` (not `source`) to narrow by cloud account. `source` filters by cloud platform type (k8s/aws/gcp/azure).

## kg_traverse Usage
Use for:
- Service call chains (CALLS): "what does X call?", "what calls X?"
- Dependencies (downstream): "what does X depend on?"
- Dependents (upstream): "what uses / depends on X?"
- Topology: "what namespace / cluster does X run on?"
- Connectivity: "what does this load balancer route to?"

### Hard Limits
- **`node_ids` is capped at 10 per call.** If you need to traverse from more than 10 seed nodes, batch them: call `kg_traverse` multiple times with ≤10 IDs each, then combine the results. Exceeding the limit returns: `kg_traverse error: node_ids limited to 10 entries`.
- `result_limit` (returned nodes) defaults to 50, max 200. If `truncated:true` is returned, narrow the query — see "Handling truncated results" below.

### CALLS semantics (broadened)
A `CALLS` edge may terminate at any cloud-resource node type, not just `Service`/`Workload`/`ExternalService`. When cloud-resource enrichment matches a hostname to a real resource, the inbound `CALLS` edge is repointed directly at that resource — so `CALLS` can land on `Database`, `Storage`, `Cache`, `MessageQueue`, `LoadBalancer`, `APIGateway`, `CDN`, `ServerlessFunction`, etc. Treat these as data-dependencies, not service-flow hops.

When the rewrite happened, the edge carries provenance properties: `original_hostname` (the literal hostname the flow source observed, e.g. `db.example.com`) and `original_es_unique_key`. Cite the hostname when explaining why a workload depends on a given cloud resource; it is the breadcrumb between the observed traffic and the enriched node.

### Direction Guide
- downstream = what X calls / connects to / depends on / runs on
- upstream = what calls X / connects to X / depends on X
- both = bidirectional exploration

### Relationship Types (exact spelling required)
Strings must match exactly — wrong relationship_types silently return 0 edges.
- Service flow: CALLS, PUBLISHES_TO, SUBSCRIBES_TO, RUNS_ON
- Deployment & build: IS_DEPLOYED_FROM, IS_CONFIGURED_BY, REFERENCES_IMAGE, USES_IMAGE, PULLS_FROM, BUILT_FROM
- Ownership & grouping: BELONGS_TO, RUNS_IN, MANAGES, OWNS
- Config & secrets: USES_CONFIG, USES_SECRET, STORES_IN, IS_ENCRYPTED_BY
- Storage: MOUNTS, IS_BOUND_TO, PROVIDES_STORAGE
- Networking: EXPOSES, ROUTES_TO_SERVICE, ROUTES_TO_BACKEND, PROTECTS, IS_ACCESSED_VIA
- Cloud (legacy / generic): ROUTES_THROUGH, ROUTES_TO, HOSTED_ON, RESOLVES_TO, ASSOCIATED_WITH
- Observability: EMITS_LOGS_TO, EMITS_METRICS_TO, EMITS_TRACES_TO
- Identity: RUNS_AS, ASSUMES
(canonical list: api-server/services/knowledge_graph/core/types.go)

### Node Types (exact spelling required)
Strings must match exactly — wrong node_types / exclude_node_types silently return 0 nodes.
- Service flow / non-infra: Service, Workload, Database, MessageQueue, Queue, Topic, Cache, ExternalService, ServerlessFunction, Repository, Job, CronJob
- K8s infra: Cluster, Namespace, Pod, Node, K8sService, Ingress, NetworkPolicy, ConfigMap, K8sSecret, PersistentVolumeClaim, PersistentVolume, CustomResource, ManagedCluster
- Cloud network / compute: ComputeInstance, ComputeInstancePool, VPC, Subnet, SecurityGroup, NetworkInterface, RouteTable, LoadBalancer, BackendPool, NetworkGateway, PrivateEndpoint, APIGateway, PublicIP, Storage, CloudResource, InfraStack
- Build / registry / config: ContainerRegistry, ContainerImage, Artifact, HelmChart, HelmRelease, Configuration
- DNS / CDN: DNSZone, DNSRecord, CDN
- Identity / secrets / security: ServiceIdentity, SecretVault, EncryptionKey, SecurityService
- Observability: MonitoringService, LogAggregator
- Backup / other cloud services: BackupVault, BackupPolicy, EmailService, AIService
(canonical list: api-server/services/knowledge_graph/core/types.go — NodeType enum)

TIP: Set relationship_types when the question implies an edge type (CALLS, RUNS_ON, EXPOSES, ROUTES_TO_*, PULLS_FROM). The filter is applied inside the database query, so it reduces both server cost and response size. Omit ONLY when the question is genuinely open-ended (e.g. "show me everything around X").

### Progressive Refinement (start narrow, broaden iteratively)
Default to the smallest query that could answer the question:
1. Start with `max_depth: 1` AND a specific `relationship_types` filter when the question implies one. This typically returns < 5 KB.
2. If the immediate neighborhood is insufficient, raise `max_depth` to 2.
3. Only then drop `relationship_types` or use `direction: "both"`. Reach `max_depth: 3` only as a last resort.

Why: a `direction:"both", max_depth:3` upstream walk on a busy workload returns 5×–10× more data than `max_depth:1` with a typed filter, and most of it is unrelated to the specific question.

Worked example — "What is the ingress path to workload X?":
- Step 1: `kg_traverse(node_id:"<X>", direction:"upstream", max_depth:1, relationship_types:["EXPOSES"])` -> finds the K8sService(s) exposing X.
- Step 2: for each K8sService id, `kg_traverse(node_id:"<svc>", direction:"upstream", max_depth:1, relationship_types:["ROUTES_TO_SERVICE","ROUTES_TO_BACKEND"])` -> finds Ingress / LoadBalancer.
- Only fall back to `direction:"upstream", max_depth:3` (no filter) if step 1 returns nothing useful.

### When to override exclude_node_types
By default, LoadBalancer direction=both queries exclude SecurityGroup, NetworkInterface, and Subnet to reduce noise for general connectivity questions. For the following investigations you MUST pass `exclude_node_types: []` to see the full subgraph:
- Security-group attachment / firewall debugging
- Subnet routing / availability-zone placement
- Network interface / ENI issues

Example: `kg_traverse(query:"my-lb", node_types:["LoadBalancer"], direction:"both", exclude_node_types:[])`

### Handling truncated results
If the response contains `truncated: true`, the query matched more nodes than `result_limit`. Do NOT act on partial data. Instead:
- Narrow the query: add `namespace`, `node_types`, or `relationship_types` filters.
- Reduce `max_depth`.
- Raise `result_limit` (up to 200) ONLY if a full view is genuinely needed.
Never report findings from a truncated traversal without acknowledging the truncation.

### Diagram / Visualization Rules (Rule 1 enforcement)
When you produce a Mermaid diagram or any visual representation of dependencies:
- **Every edge in the diagram must correspond to a row in a `kg_traverse` response from the current conversation.** Build the diagram by enumerating those edges, not by reasoning about what "should" connect.
- **Do not infer hub-and-spoke patterns from service names.** If `services-server` has no inbound CALLS in the data, do NOT draw arrows pointing to it just because the name suggests it's a hub.
- **If you can't find an edge for a connection you'd like to draw, omit it.** Then either note the gap ("no observed CALLS from X to Y in the KG") or run another targeted traversal to confirm/deny.
- **Annotate edges with the citation** (`[KG Traverse - E#]`) inline in the diagram caption or in a legend below it.
- Before passing arguments to the `visualizer` tool, list every node and edge you intend to include and verify each against tool outputs.

### Common Patterns
1. "What does workload X call?" → TWO-STEP (CALLS edges live on K8sService, not Workload; Rule 5):
   a. `kg_search_nodes(query:"X", node_types:["Workload","K8sService"], namespace:"<ns>", account_ids:["<uuid>"])` → returns both the Workload AND K8sService named X.
   b. `kg_traverse(node_ids:["<workload-id>","<k8s-service-id>"], direction:"downstream", relationship_types:["CALLS"], max_depth:1)` → gets the CALLS edges (the K8sService is where they originate).
   Do NOT traverse only from the Workload — you will get 0 CALLS edges and have to re-query.
2. "What calls workload X?" → same two-step as #1 but with `direction:"upstream"`.
3. "Full dependency map of X" -> start with kg_traverse(query:"X", direction:"downstream", max_depth:1); raise to max_depth:2 ONLY if the immediate neighborhood is incomplete.
4. "What workloads run in namespace Y?" -> kg_traverse(query:"Y", node_types:["Namespace"], direction:"upstream", relationship_types:["RUNS_ON"])
5. "Check LB connectivity" -> kg_traverse(query:"lb-name", node_types:["LoadBalancer"], direction:"both")
6. "Find all databases" -> kg_search_nodes(query:"", node_types:["Database"])
7. "What cluster hosts namespace X?" -> kg_traverse(query:"X", node_types:["Namespace"], direction:"downstream", relationship_types:["RUNS_ON"])
8. "Ingress path to workload X?" -> two narrow hops:
   a. kg_traverse(node_id:"<X>", direction:"upstream", max_depth:1, relationship_types:["EXPOSES"]) to find the K8sService(s).
   b. for each K8sService id, kg_traverse(node_id:"<svc>", direction:"upstream", max_depth:1, relationship_types:["ROUTES_TO_SERVICE","ROUTES_TO_BACKEND"]) to find Ingress / LoadBalancer.
   Do NOT use a single direction:"upstream", max_depth:3 unfiltered call as the first attempt — that returns far more data than needed.
9. "Map all communication in namespace Y" → multi-step:
   a. `kg_search_nodes(query:"", node_types:["Workload","K8sService"], namespace:"<Y>", account_ids:["<uuid>"])` → collect all node IDs.
   b. Batch the IDs into groups of ≤10 (Rule 4) and run `kg_traverse(node_ids:[...], direction:"both", max_depth:1, relationship_types:["CALLS"])` per batch.
   c. Combine the edge lists. Every edge in the final answer's diagram MUST come from one of these traversal responses (Rule 1).
