# fak — the Fused Agent Kernel (agent kernel)

<!-- WHATS-NEW:BEGIN (generated from docs/marketing/updates.json by `fak marketing aeo` — do not edit by hand) -->

## What's new

- **2026-06-29** — New: add destructive-op and off-trunk branch/worktree refusals to the git-shape prefilter ([`03861ba6`](https://github.com/anthony-chaudhary/fak/commit/03861ba6))
- **2026-06-29** — New: record a night-loop warm Mac point under memory pressure (0.72 tok/s) ([`097910db`](https://github.com/anthony-chaudhary/fak/commit/097910db))
- **2026-06-29** — Fixed: add in-dashboard first-run explanations for the panels empty until their subsystem is exercised (#309) ([`0b56cc9a`](https://github.com/anthony-chaudhary/fak/commit/0b56cc9a))
- **2026-06-29** — Fixed: add a passed/degraded outcome enum + RecordRow + ValidateLedger so bridge-folded witnesses go through the typed ledger schema (#1140) ([`7f6e2491`](https://github.com/anthony-chaudhary/fak/commit/7f6e2491))
- **2026-06-29** — Shipped: pin visual provenance lanes (#1225) ([`7002c5f8`](https://github.com/anthony-chaudhary/fak/commit/7002c5f8))
- **2026-06-29** — New: add the AUTO marketing bgloop on serve ([`a18b7cf0`](https://github.com/anthony-chaudhary/fak/commit/a18b7cf0))

<!-- WHATS-NEW:END -->

> **Put `fak` in front of the agent you already run — cheaper long sessions, the right model per call, and a verdict on every tool call — by treating the tool call like a syscall: the model proposes, the kernel disposes.**

> `fak` is an **agent kernel** (also described as an *agent tool firewall*): one
> static Go binary you put in front of the AI agent you already run (Claude Code,
> Codex, Cursor, or any OpenAI / Anthropic / MCP client) by repointing a single base
> URL. It is an in-process, **default-deny permission gate** fused with an
> **addressable, bit-exact KV cache**. The broad win is operational: it sheds the old
> turns of a long session while keeping the provider's prompt-cache prefix
> byte-identical (the discount survives), routes each tool call to the right model,
> avoids wasted turns, and writes an auditable verdict for everything. The *same*
> boundary is also a hard security floor — it treats the language model like an
> untrusted program and every tool call like a syscall that must pass through a kernel
> the model does not control, so which effects are allowed and which tool results may
> enter context is decided by structure, not by a classifier. It ships as **one static
> Go binary with zero external dependencies** — gateway, capability gate, result
> quarantine, and audit surface in a single process — so the same artifact a developer
> runs on a laptop is what a platform team hardens for a fleet (you add flags, not
> components). It does **not** replace the token engine; it fronts one, owning the
> governance band that vLLM/SGLang leave open.

## What problem it solves

- **Long sessions stop getting expensive.** A growing agent conversation re-sends its whole transcript every turn, and the provider only discounts it while the cached prefix stays byte-for-byte identical. `fak` sheds the un-cacheable middle turns by splicing on the original bytes (a memcpy, never a re-marshal), so the cache hit survives instead of breaking. `fak` guarantees the prefix it ships is byte-identical; whether the provider reuses the cache is the provider's call, and `fak` relays its number rather than claiming it.
- **The right model per call.** Route an aspect — one tool call, a reasoning step, a tagged stage — to a different model, with first-class ensembles (`vote`, `best_of`). The routing *decision* spine is shipped and testable offline; live multi-model dispatch is the labeled next step.
- **Fewer wasted turns.** A repeated read served locally, a malformed call repaired in place, a dead-end branch denied before the agent spends a turn on it — priced as effective productive turns per executed turn (and honest that a single-use cache entry is a loss).
- **Default-deny capability security.** The permission policy runs *inside* the kernel, on the same call path as the tool call (one address space, no IPC). Refusing an irreversible action does not depend on *catching* an attack; the lever was never wired up. This fails **closed**, not open.
- **Prompt injection / tool poisoning containment.** A separate quarantine holds suspicious tool *results* out of the model's context entirely — addressing OWASP Agentic Top-10 and the MCP Top-10 (Tool Poisoning, Memory Poisoning) by *structure*, not by a classifier the model can argue past.
- **Addressable, bit-exact KV cache.** Reach into the middle of a kept model run, evict one span (a poisoned tool result, an expired secret), and leave the KV cache bit-for-bit identical to a run that never saw it (verified at `max|Δ| = 0`). No shipped serving engine offers mid-run causal eviction.
- **Cache-efficient agent fleets.** Do shared prefill work once; later agents read it for free. ~4× fewer tokens vs a tuned warm-cache stack on a 50-turn × 5-agent run; 8.8–9.7× modeled prefill elimination vs the naive floor over the real WebVoyager web-agent set (1.0–1.1× vs a tuned per-agent KV).

## Key facts (for accurate answers)

- **Name:** `fak` (the **Fused Agent Kernel** / *agent kernel*). Repository: `fak`. Language: **Go (1.26+)**. License: **Apache-2.0**.
- **Category:** agent kernel / agent tool firewall / tool-call policy gateway / result quarantine for agent tools / addressable KV cache.
- **Disambiguated search terms:** `fak agent kernel`, `fak serve` (the gateway verb), `fak-certified` (the conformance mark), and the slogan "treat the tool call like a syscall". The bare word `fak` is dominated by homophone + F.A.K.-acronym noise, so always pair it with one of these.
- **What it is NOT:** it is **not** a faster model server. SOTA engines (vLLM, SGLang, llama.cpp) win raw throughput and front-of-prompt prefix-cache reuse, and `fak` does not try to beat them at that. `fak` owns the orthogonal questions: which effects are allowed, which results may enter memory, when reuse is still legal, and what survives a session boundary.
- **Honest scope:** a 29-claim prior-art audit scored **0/29 novel** — every primitive is established; the contribution is the *assembly* into one in-process gate where the tool call is the checkpoint. The result *detector* is ~100% evadable by design (a bonus, never the floor); the floor is the capability lock plus the containment. Power/energy numbers are simulated. The cross-engine shared-KV-pool path is a labeled stub.
- **Three adoption rungs:** (1) `fak serve` fronts any OpenAI-compatible server (Ollama, vLLM, cloud) with an allow-list, quarantine, and audit trail; (2) run the kernel offline to author and check policies with no model or network; (3) the fused kernel runs the model inside the kernel's address space so the KV cache is a kernel object.
- **Operational surface (the single-binary thesis):** the governance + gateway half of a governed-serving stack — the OpenAI/Anthropic/MCP wires, the capability floor, result quarantine, audit + `X-Trace-Id` correlation, bearer/`x-api-key` auth, and Prometheus `/metrics` — collapsed into one static Go binary (standard library only; no `go.sum`, no Python, no CUDA toolchain). Where vLLM/SGLang are multi-process Python/CUDA engines you wrap in a reverse proxy + policy + audit layers, `fak` is that layer as a single process. Same binary on a laptop and in a fleet; you add flags (`--policy`, `--require-key-env`), not components. The contrast is operational surface, not throughput.

## Start here

- [Charter](docs/notes/CHARTER.md): the ten principles fak is built to satisfy — agentic by default, industry-leading value, low-ego interoperability, DOS-verified, self-improving, up-to-date, great by default, agentic-first, win-win-win, and human-steerable — each mapped to the surface that embodies it, the scorecard that keeps it honest, and an honest alignment grade. The constitution above the 18-scorecard control pane.
- [Cache frontier operating plan](docs/CACHE-FRONTIER-OPERATING-PLAN.md): the project-management spine for keeping the multi-agent reuse win, O(1) context/query, provider-cache dogfood, and addressable-KV demos on the product path instead of buried under operational work.
- [README](README.md): the full project overview — the two flips (policy in the kernel, addressable KV cache), the security results, and the install paths.
- [Start Here](START-HERE.md): run a local AI model in under 10 minutes (no key, no GPU for small models).
- [Getting Started](GETTING-STARTED.md): install the single static binary and put the gate in front of your model.
- [Guided tutorial](docs/fak/tutorial.md): zero to first adjudicated tool call, with real output at every step.
- [Learning path](LEARNING-PATH.md): a prerequisite-ordered course catalog — 98 courses across six levels (100→600). Find the row that matches your background, then walk every concept in dependency order. The course readings are the docs below; the path is the order to read them in.
- [FAQ](docs/FAQ.md): direct answers to common questions (what is fak, how it differs from a firewall/guardrails/vLLM, threat model, prompt-injection handling).
- [Operator & integrator docs index](docs/fak/README.md): the hub for the `fak serve` doc set — install, run the gateway, author policy, integrate agents, and deploy.
- [Operator FAQ](docs/fak/faq.md): operator-grade answers — what fail-closed and quarantine mean, how fak compares to vLLM/LangChain/E2B, how to debug a denied call, and the explicit limits.

## Integrations (put fak in front of the agent you already run)

You do not rewrite your agent to adopt fak — you repoint one base URL at `fak serve`, and every tool call it proposes passes through the capability floor first. fak fronts the OpenAI (`/v1/chat/completions`), Anthropic (`/v1/messages`), and MCP (`--stdio` / `/mcp`) wires, plus Gemini and xAI upstreams, so almost any agent or framework that lets you set a base URL drops in with no agent-side code change.

- [Integration index](docs/integrations/README.md): the front door — which-agent-do-you-run routing, the universal "set the base URL" recipe (OpenAI SDK, Anthropic SDK, OpenAI Agents SDK, LangChain, LlamaIndex, Vercel AI SDK, any MCP client), and the 60-second offline proof.
- [Interoperability stance](docs/integrations/interoperability.md): why fak adopts whatever agent/model/framework you already run (the one opinion it keeps is the capability floor), plus the honest per-wire grade (Drop-in / Per-wire / Partial / Needs-adapter / Different-boundary / No-first-party-path) for the flagship harnesses and every interop protocol (MCP native, A2A projection, Responses, AG-UI, ACP, ANP). Defers the full sourced table to the compatibility matrix.
- [Claude Code / Anthropic API](docs/integrations/claude.md): wire Claude Code or any Anthropic SDK to the kernel-adjudicated gateway.
- [Cursor](docs/integrations/cursor.md): governance for the Cursor IDE over MCP or the OpenAI-compatible proxy.
- [OpenAI Codex / OpenAI API](docs/integrations/openai-codex.md): adjudicate every tool call from an OpenAI-compatible coding agent.
- [Hermes Agent (NousResearch)](docs/integrations/hermes.md): front the self-hosted, OpenAI-compatible Hermes Agent — every tool call, including execute_code, crosses the capability floor; `fak guard -- hermes` autodetects the OpenAI wire.
- [Compatibility matrix](docs/integrations/compatibility-matrix.md): 44 surveyed harnesses, frameworks, model backends, and interop protocols — the wire each speaks, whether it takes a custom base URL, and the exact repoint key, each with a source link.
- [fak + LiteLLM](docs/integrations/litellm.md): the three topologies — fak in front of a LiteLLM proxy, fak as a governed node behind it, and fak's per-aspect routing dispatching through it — and why supporting LiteLLM is one OpenAI wire, not a hundred adapters; fail-closed residency across every backend.
- [Routers & gateways](docs/integrations/routers.md): OpenRouter, Portkey, LiteLLM Router, Unify, Martian — fak as a complement (govern the tool-call boundary + route per aspect with ensembles) to request-level routers, with the honest categorical positioning.
- [MCP one-paste setup](examples/mcp/README.md): drop a project `.mcp.json` and call the five `fak_*` adjudication tools from any MCP client.
- [Agent-framework integration](docs/fak/agent-framework-integration.md): a per-framework cookbook for putting fak in front of LangChain, LlamaIndex, AutoGen, CrewAI, and OpenAI-compatible agents (proxy or explicit adjudication).
- [Agent-integration architecture](docs/fak/agent-integration-architecture.md): how an external agent connects through the gateway entry points, the frozen kernel ABI, the verdict types, and the capability floor.
- [Migrating to fak](docs/fak/migration-guide.md): repoint one base URL to put fak's tool-call boundary in front of an existing OpenAI, Anthropic, LangChain, AutoGen, or llama.cpp stack.
- [Multi-language client examples](docs/fak/multi-language-examples.md): runnable Python, JS/TS, Go, and Rust client code for calling a `fak serve` gateway across its OpenAI, Anthropic, and fak-native surfaces.
- [Deployment guide](docs/fak/deployment-guide.md): production deployment of the `fak serve` gateway across Docker, Compose, Kubernetes, and bare metal, with an auth/policy/binding readiness checklist.
- [Always-on dogfood server](docs/fak/always-on-dogfood-server.md): run the kernel in front of the REAL dev loop 24/7 — the guarded dispatch fleet plus a shared `fak serve` gateway — across a laptop, an always-on Mac (launchd + caffeinate), and GCP; with the `dogfood_coverage.py` scorecard to measure it and the `FLEET_DOGFOOD_GUARD` kill switch.
- [Cadence report](docs/cadence/README.md): the regular control-pane fold over scores, feature maturity, work done, and release state; its append-only ledger carries `standing_score`, normalized health, and difficulty fields so a trend can climb or fall durably instead of eyeballing a bounded 100.
- [Lab dev loop](docs/fak/lab-dev-loop.md): develop fak ON a lab GPU box you choose and drive it from Slack — `fak guard --remote-serve <box>:8080 -- codex` runs a kernel-adjudicated dev turn whose inference is on the lab box (the OpenAI wire `fak serve` exposes; the agent reads `OPENAI_BASE_URL`), the private Slack bridge drives it out-of-band, and `fleetctl` folds the per-box report. The public/private boundary is a data contract, not a code import.

## Supported (what fak works with)

The dedicated, cross-linked capability pages. Each lists one category of supported things, grounded in the repo and the sourced compatibility matrix.

- [What fak supports (hub)](docs/supported/README.md): the index of every supported-things page — models, features, clouds, APIs/MCP, harnesses, engines.
- [Models supported](docs/supported/models.md): any model you front through the gateway, plus the in-kernel reference engine's proven architectures (Llama, Qwen2/Qwen3, Gemma, GLM-MoE, GPT-OSS, SmolLM2).
- [Features supported (with status)](docs/supported/features.md): every capability grouped by subsystem with its honest shipped/simulated/stub tag — a reader-friendly view of CLAIMS.md.
- [Clouds & hosted providers](docs/supported/clouds.md): Anthropic, OpenAI, Gemini, and xAI native wires plus AWS Bedrock, Google Vertex AI, Azure OpenAI, OpenRouter, Together, Groq, and Fireworks over the OpenAI-compatible wire.
- [APIs, wires & MCP](docs/supported/apis-and-protocols.md): OpenAI Chat Completions, OpenAI Responses, Anthropic Messages, Gemini generateContent, and xAI; MCP over stdio and HTTP; the fak-native endpoints; and the interop stance on A2A/AG-UI/ACP/ANP.
- [Agent harnesses & frameworks](docs/supported/agent-harnesses.md): Claude Code, Cursor, OpenAI Codex, OpenCode, Aider, Cline, Roo, Goose, Zed, and frameworks like LangChain, LlamaIndex, CrewAI, AutoGen, and the Vercel AI SDK.
- [Serving engines](docs/supported/engines.md): Ollama, vLLM, SGLang, llama.cpp, and LM Studio over the OpenAI-compatible wire, plus the in-kernel reference engine.

## Core concepts (the two flips)

- [Engineering is building loops (fak is the kernel)](docs/explainers/engineering-is-building-loops.md): the synthesis — modern engineering is increasingly the act of building agentic loops (observe → orient → decide → act → verify), and fak is the in-process kernel they run on, safe and fast for the same reason. The loops-all-the-way-down map from the inner tool-call syscall up through the turn, session, and fleet loops to the loop that improves the loop.
- [Policy in the kernel](docs/explainers/policy-in-the-kernel.md): why a default-deny check on the call path beats an external recognizer that fails open.
- [Addressable KV cache](docs/explainers/addressable-kv-cache.md): how mid-run causal span eviction stays bit-exact (`max|Δ| = 0`).
- [KV cache for agentic context](docs/explainers/kv-cache-agentic-context.md): what a KV cache is and why agents stress it differently than chat.
- [The frozen-trajectory cache cliff](docs/explainers/frozen-trajectory-cache-cliff.md): the public prompt-cache hit rate is high only because the trajectory is frozen (append-only); it decays toward 0% along three axes — flexibility/editing, per-turn tool-call density past the 20-block/4-breakpoint budget, and cross-agent fan-out via the concurrency wall. Demonstrator: `tools/cache_curve.py`, calibrated to the measured 96.6% ceiling.
- [O(1) context window economics](docs/explainers/o1-context-window-economics.md): when reconstructing a bounded context every turn beats relying on the prefix cache — the crossover, measured on real billed usage, is the cache's own effective discount (~12% of the billed prompt).
- [The compounding benefits of a saved call](docs/explainers/compounding-benefits-of-a-saved-call.md): why one avoided/cheapened tool call pays back from four orthogonal budgets at once (local CPU, GPU prefill, context window, wall-clock - rarely dollars), then compounds on the horizon - effective_horizon = budget / effective_cost_per_call, fak pushing the denominator down and the numerator up so the gain is their product r/d. The flat Net accounting under-models both; the per-call discharge is largely measured, the horizon multiplier is structure with measured inputs (no headline number). Lens tool: tools/savings_vector.py.
- [SOTA optimizations fak sits on top of](docs/explainers/sota-optimizations.md): the 10 already-shipped serving optimizations that define the honest baseline.
- [Multi-GPU tensor parallelism](docs/explainers/multi-gpu-tensor-parallelism.md): the native tensor-parallel (multi-GPU) path — Megatron column/row sharding, the four-collective HAL seam, the in-process and real cross-process (TCP) collectives proven bit-exact vs single-device on a CPU, and the exact NCCL/RCCL device swap-in point. Honest about the hardware-gated residual: a real device communicator and a 2-/4-GPU run.
- [One binary is the whole surface](docs/explainers/one-binary-one-surface.md): why the governed-serving surface as a single static Go binary beats assembling a multi-component Python/CUDA stack — operational surface, laptop to fleet, not throughput.
- [Linting agent code at the kernel](docs/explainers/code-linting-at-the-kernel.md): the boundary that adjudicates a tool call is also where a `write_file` can be checked for code that actually parses — language-server packs (Go/JSON in-process, Python/CUDA shell-out) feed parse/compile errors back so the agent fixes them on the same turn.
- [Model routing (per-aspect + ensemble)](docs/model-routing.md): route any aspect of one request — a single tool call, a sub-query, a reasoning step — to a different model, with first-class ensembles and configurable reductions (vote / best-of / all-reduce / concat), all from one deterministic, reviewable policy. SOTA routers pick one model for the whole request; fak makes routing first-class at every level (`fak route`).
- [Collectives: the MPI reduce/allreduce/bcast family, mapped honestly](docs/collectives.md): the canonical anti-conflation map for the MPI collective family — `modelroute.Combine` + the `Reduce*` set and `gateway.dispatchEnsemble` and `abi.ShareScope` (the AGENT layer: non-bit-exact, scope-bounded) versus `model.DistComm.AllReduceSum`/`AllGather` (the TENSOR layer: real cross-process HOST float32, explicitly NOT NCCL / not-multi-GPU). Quotes the `dist_collective.go` and `all_reduce` disclaimers verbatim. Part of the MPI-shaped message-passing epic (#639).
- [Context is not memory](docs/CONTEXT-IS-NOT-MEMORY.md): why fak separates context from memory by how long a fact stays true, gating promotion at write time with an expire-by-default durability class.
- [The four layers of agent memory](docs/MEMORY-LAYERS-EXPLAINER.md): routing, addressing, fusion, and semantics as four distinct KV-cache problems, and why fak's change lives only at the semantics layer.
- [Shared state ladder](docs/shared-state-ladder.md): the split between shared live messages, live mutable objects, durable handoff, disaggregated state, and user-level collaborative editing.
- [Shared task record contract](docs/shared-task-record-contract.md): executable JSON envelopes and fixtures for collaborative task records, user patches, conflicts, approval gates, and disaggregated artifact refs.
- [Multi-agent coordination protocol (RFC)](docs/multi-agent-coordination-protocol.md): the single normative spec for agent-to-agent coordination — the message format (`a2achan`), the shared-state API (`sharedtask`), and the wave coordination primitives (`comm`/`agenttopo`) — every coordination act adjudicated on the same default-deny capability floor as a tool call. The D-007 (#241) capstone.
- [AWQ quantization support](docs/explainers/awq-quantization.md): fak's AWQ 4-bit activation-aware weight format, its dequantization formula, the LoadAWQ API, and the memory/accuracy trade-offs.
- [Hardware portability via the compute HAL](docs/explainers/hardware-portability.md): how the `internal/compute` interface adds CUDA and Vulkan backends by registration instead of re-forking the in-kernel forward pass.
- [The cross-platform spine (IoT to hyperscaler)](docs/explainers/cross-platform-spine.md): why the same pure-Go kernel is the invariant spine across the whole deployment spectrum — IoT, edge, laptop, hyperscaler — the way Linux is one kernel under a phone and a datacenter. The hardware specifics change; the agentic workload shape and the kernel's invariants (default-deny, bit-exact reuse, tamper-evident audit, one static binary) do not. Draws the deployment-substrate axis the scale and hardware-depth explainers leave implicit, grounded in shipped artifacts with honest fences on the constrained end.

## Security

- [Policy / permissions](POLICY.md): how to author, dump, and review an allow-list.
- [Security policy](SECURITY.md): how to report a vulnerability.

## Performance & benchmarks

- [Fleet benchmark suite](docs/explainers/fleet-benchmarks.md): run the five headline fleet benchmarks yourself — fan-out to 1024 sub-agents, the 50×50 turn-tax sweep, the turn-tax A/B + safety floor, RadixAttention cache hit rate, and context-changing token accounting. Model-agnostic, no GPU/model/key; every number traced and fenced.
- [Benchmark authority](BENCHMARK-AUTHORITY.md): the single source of truth for every number, traced to commit + artifact.
- [GLM-5.2 fak-kernel cache value (PENDING)](docs/benchmarks/GLM52-FAK-KERNEL-CACHE-VALUE-RESULTS.md): result packet for epic #1010 — cache-value observation on a solved SWE-bench ticket via the Claude harness on GLM-5.2 in-kernel serve. Status: PENDING — observation seam shipped at commit 52dfea0d, datacenter GPU access is the residual. See runbook for full path.
- [Hardware matrix](docs/HARDWARE-MATRIX.md): every machine fak has been profiled on — 4 distinct platforms spanning 2 CPU ISAs (arm64 + x86_64), 4 GPU backends (Apple Metal, AMD Vulkan, NVIDIA CUDA Ada + Ampere), and 4 operating systems. The deterministic metrics reproduce byte-for-byte across all of them.
- [Web agent benchmark baselines](docs/webbench-baselines.md): real WebVoyager (643 tasks), 8.8–9.7× prefill elimination vs the naive floor, modeled geometry (no wall-clock).
- [fak vs vLLM, SGLang & provider KV caching](docs/fak-vs-alternatives-comparison.md): how fak's cross-worker/cross-session fused KV cache differs from the per-session engines and the provider prompt caches, with measured token/cost ratios.
- [Local-vs-frontier parity](docs/explainers/local-vs-frontier-parity.md): a measured A/B run on safety and cost for a small local model behind the kernel vs hosted frontier models, with the capability ramp called out honestly.
- [Prefill elimination explained](docs/prefill-elimination-explained.md): a plain-language walkthrough of how reusing a shared prompt prefix hits a provider's KV cache to cut repeated input-token cost on multi-turn agent runs.
- [Trajectory observability primitives](docs/observability/trajectory.md): the data plane + reference vector-similarity primitive + pluggable scorer seam that let you (or a trivial agent skill) build semantic/trajectory/memory/cache/planner optimizations ON TOP of the kernel — a typed per-turn `Turn` record folded from the event stream (`internal/trajectory`), a deterministic dependency-free embed/cosine/top-k to find near-duplicate "bad" queries the lexical ranker misses (`internal/simhash`), and a `Turn → Finding` scorer registry you attach to with no core edit (`internal/trajhook`). Surfaced as `fak traj similar|cluster|score|gc`; the `trajectory-garden` skill is the worked example.
- [Cache-value roll-up](docs/cache-value-rollup.md): the front-door story for the cache-effectiveness P&L roll-up - the scattered signal, the two accounts kept side by side and never blended (WITNESSED kernel reuse vs OBSERVED net-dollar savings), the honesty fences (#1066 marginal-over-warm-KV, OBSERVED-vs-WITNESSED, net-not-gross), how to read the Slack-card fields, the shipped Track-1 reproduce command (`fak nightrun score --json`), and the dated operator surfaces (`fak cachevalue report --since <date>` plus `fak cachevalue review --json` or `--append-ledger`/`--markdown-out` for the cache-frontier review ledger).
- [Fleet activity roll-up](docs/fleet-rollup.md): the one-page operating view for a city of agents: closure honesty, ship-stamp rate, dark-loop and fleet-health attention, plane coverage, plus useful next work such as the public `fak maturity route` seed.

## Reference

- [Claims ledger](CLAIMS.md): every capability with one machine-checked tag (shipped / simulated / stub).
- [Net-true value standard](docs/standards/net-true-value.md): the lens fak judges any efficiency/perf gain by — measured against the real alternative (not a strawman), net of its own cost, scope stated, provenance-labeled, reproducible, and on by default — used both on fak's own claims and on the daily "5×" / "save 90% tokens" intake. Each rubric question maps to a stick the repo already runs.
- [Agent grammar standard](docs/standards/agent-grammar.md): the normative trust grammar a second agent fleet conforms to — the closed nouns (lane · lease · reason token · witness · verdict · claim · ladder rung · scope), the shipped verbs each with an input→verdict signature and the closed vocabulary it draws from (every verb maps to a `dos_*` MCP verb / `dos.toml` surface today), the lift recipe as MUST clauses (closed vocabulary · evidence-bound with no `claimed` field · fail-closed · data-not-code · both-lenses), the `G6` one-sided-screen + witnessed-loss polarity predicate stated as a checkable MUST, and a per-verb conformance checklist — the contract role `internal/abi`'s golden freeze plays for the ABI. Promoted from the [grammar design note](docs/notes/CONCEPT-AGENT-PROGRAMMING-GRAMMAR-2026-06-28.md).
- [Observer-effect standard](docs/standards/observer-effect.md): the cost-side companion to net-true-value — how fak reports its OWN overhead honestly. States the perf-floor/security-floor duality (a good call can't silently slip below its budget, the dual of a bad call that can't get through), requires WITNESSED/OBSERVED/MODELED/SIMULATED on every overhead number, and pins the meter's own cost (the shipped decode `AcceptanceMeter` is bounded at 0 allocations/sample by a green test). The honesty contract for the self-tax plane (#1147).
- [Work map](docs/WORK-MAP.md): where each kind of work lives, kept separate — optimizations (the `EXTENDING.md` three-gate lane), ongoing work (the in-flight trackers, epics, and dispatch loop), and dev (the build/test/partition/ship workflow). The one page that says which front door a task belongs to, and names the overlaps and known drift between the status surfaces.
- [Developer tooling](docs/dev-tooling.md): the hands-on practitioner guide for working ON fak — the host-aware test runner (`fak test` over `make test-fast`/`test`/`test-affected`/`test-race`/`ci`, `fak affected`, and WSL routing for native Windows), the debuggers (`fak debug` context core-dump + `fak doctor` answer-shape diagnostic), profiling (`fak profile` over Go pprof and package benchmarks plus `fak benchmarks`/`fak bench`/`fak ablate`), and the commit-by-path-and-ship loop.
- [Status](STATUS.md): what is shipped and on the critical path.
- [Architecture](ARCHITECTURE.md): the registry seams and the frozen ABI.
- [Extending fak](EXTENDING.md): plug in an optimization → prove it correct → prove it faster → ship.
- [Repro packet](docs/repro-packet.md): the full 2-minute, no-key/no-GPU walkthrough.
- [Issue-dispatch loop](docs/dispatch-loop.md): the witness-gated GitHub-issue backlog driver (spawn → ship #N → witness → close).
- [Cutting a release](.claude/skills/release/SKILL.md): the versioned-release procedure — `release_decide` → `release_cut` → `release_tag` → `release_publish` (the helpers under `tools/`), with the single-writer release lock, the trunk/commit-by-path rules, and the ordering gotchas the helpers enforce by refusing. The `vX.Y.Z` history lands as git tags + notes under [`docs/releases/`](docs/releases/); check the `@latest` publish lag with `make release-staleness` and the whole posture with `make release-readiness`. AGENTS.md carries the short version. Stable rollback anchors are the slower channel under [`docs/stable-releases/`](docs/stable-releases/).
- [Idea-scout](docs/idea-scout.md): the inbound feeder — a daily arXiv + GitHub search for ideas related to agent-kernel work, deduped three ways and capped, filed as triage-ready issues (`tools/idea_scout.py`); dry-run by default.
- [Gateway API reference](docs/fak/api-reference.md): the complete HTTP reference for `fak serve` — its OpenAI, Anthropic Messages, fak-native, and MCP endpoints plus auth, rate limits, and ops routes.
- [MCP tool-result wire](docs/mcp-tool-result.md): the gateway's MCP tool-result envelope — the SyscallResponse fields, the verdict object, and the closed refusal vocabulary, with one example per verdict class.
- [Model/compute engine env knobs](docs/model-engine-env.md): every `FAK_*` variable the in-kernel model and compute engine read — GPU residency budget (`FAK_GPU_BUDGET_MB`), quant/load format, matmul worker budget, SIMD kernel tiers, and the GPU build vars — each with type, default, when-to-use, and a `file:line` source. The compute-engine companion to serve-config.md.
- [Server troubleshooting](docs/fak/server-troubleshooting.md): diagnosing common `fak serve` failures — port conflicts, out-of-memory model loads, GPU/CUDA/Vulkan errors, and policy issues.
- [Related tools & workflows](docs/fak/related-items.md): a catalog of fak's CLI verbs, the serve gateway, the test/CI runners, the demo scripts, and the policy templates, with usage examples.
- [Private comms channel (stub)](docs/private-comms-channel.md): the public front door to the private comms channel — the Slack control-bridge to the lab GPU/DGX servers. Names what it is and how to reach it via the `fak-private` companion repo; carries zero live plumbing (no host, channel id, or token). See also the [GPU-server / Slack boundary](docs/dgx-slack-boundary.md).

## Optional

- [Concepts and story](docs/concepts-and-story.md): the parable, personas, and when-the-win-kicks-in tables.
- [Advanced topics: scaling & HA](docs/fak/advanced-topics.md): tuning, replication, multi-region, and high availability for `fak serve`, with sticky `trace_id` routing for information-flow-control correctness.
