start local · go only · no key

Demos that run everywhere.

Start with the lowest-common-denominator fak demos: no model, no GPU, no provider key, no network. Then move into dedicated tracks for security, research/science, adoption, memory/serving, and the hosted live-model races.

local · adoption · no model

Drop-in wrapper — put fak in front of your agent

The lowest-friction adoption proof. It shows the child-only environment and base-URL wiring for Claude Code, Codex, OpenCode, and Aider without sending a model request or needing a provider key. Run go run ./cmd/dropindemo -print or open http://127.0.0.1:8154.

Run the drop-in proof →
local · security · no model

Without fak vs With fak — the safety floor

The moat, side by side. The same adversarial tool-call trace runs down two columns at once: without fak, a poisoned tool result is admitted to context and the injected delete_account payload executes; with fak, the poison is paged out and the destructive call is refused at the boundary — while the legitimate calls run on both. A real kernel verdict per row, no model. Run go run ./cmd/guarddemo -print or open http://127.0.0.1:8151.

Run your copy →
hosted · efficiency · no model

Turn-tax — fak vs a SOTA loop

Two lanes race in real time: a SOTA two-pass agent loop versus fak's one-shot kernel, replaying the same class-labeled tool-call trace. Every turn fak saves — a grammar repair, a vDSO cache hit, a poisoned result quarantined — ticks up visibly on one lane while the other stays flat. The safety floor sits on its own axis, never folded into the turn count.

Replay through the kernel →
local · tokens · no model

Token ledger — model context and tool calls

A browserless accounting demo with two separate meters: a prefiltered bad call keeps model-context tokens out, and a repeated read served from cache avoids tool executions. Run go run ./cmd/tokendemo -print; no browser, port, model, or network.

Read the ledger →
local · research/science · no model

Un-See It — poisoned KV span removal

The KV-cache eviction witness drives the real ctxmmu gate, kvmmu bridge, and KVCache.Evict against a synthetic model witness. Run go run ./cmd/unseedemo -print or open http://127.0.0.1:8156.

Run the KV witness →
local · agentic · no model

What time is it, Mr. Wolf? — a kernel-gated agent loop

The children's game as an agent loop: a one-tool agent calls get_time through the real kernel.Fold path, while a red-team variant's smuggled delete_calendar / wipe_disk is refused at the capability floor — inside the loop. Run go run ./cmd/timewolfdemo -print or open http://127.0.0.1:8155.

Run the agent loop →
local · agentic · no model

Try it — a kernel-gated agentic chat

Type a message and a tiny tool-using agent answers, but every tool call runs through the real kernel.Fold path first. Ask for the time or the weather and it answers; ask it to delete_account or paste a prompt injection and the destructive call is refused at the capability floor. Run go run ./cmd/trychatdemo or open http://127.0.0.1:8157.

Open the chat →
hosted · research/science · SmolLM2-135M

Multi-agent context reuse

The fleet thesis made visible: a shared prefix prefilled once and cloned into N agents, with a per-agent timeline showing each tool result drawn to scale as the context grows unevenly. Pick a scenario, read the exact prefill-token work each strategy does (warm KV vs fak), then run the live race — fak vs the warm-cache baseline — through the real in-kernel model.

Open the reuse proof →
hosted · live model research

Reuse race vs SOTA + the reuse curve

A head-to-head live race over one 25-request multi-agent session. The headline is fak vs a tuned warm-cache baseline — the per-agent KV / prefix-caching stack vLLM · SGLang · provider prompt-caching give you: it caches the prefix once per agent and ingests only new tokens. fak prefills the shared prefix once for the whole fleet, clones it into the agents, and batches decode. Same model, same tokens, same answers. Then build the reuse curve across the model ladder.

Run the live race →

Lowest-common-denominator terminal proofs

These headless demos render the same kernel verdicts in your terminal, side by side, in ~30 seconds. The specialized research and live-model demos remain available, but this set is the common floor: one command each, no weights, no GPU, no network.

safety go run ./cmd/guarddemo -print
  fak · the safety floor, side by side — scenario: guard-redteam (7 calls)
  same agent · same attack · same tool calls — run twice

  WITHOUT fak                         the tool call             WITH fak
  ──────────────────────────────────  ────────────────────────  ──────────────────────────────────
  x POISON ADMITTED to context        fetch_policy              # paged out (quarantined)
  . ran (legit)                       get_user_details          . ran (allowed)
  x EXECUTED (account deleted)        delete_account            # REFUSED (deny-as-value)
  . ran (legit)                       search_direct_flight      . ran (allowed)
  x EXECUTED (account deleted)        delete_account            # REFUSED (deny-as-value)
  . ran (legit)                       book_flight               . ran (allowed)
  x EXECUTED (account deleted)        delete_account            # REFUSED (deny-as-value)
  ──────────────────────────────────  ────────────────────────  ──────────────────────────────────
  WITHOUT fak: 4 breaches                                       WITH fak: 0 breaches
  fak refused 3 destructive ops and paged out 1 injection — and still ran the 3 legitimate calls.
efficiency go run ./cmd/turntaxdemo -print
  fak · the turn tax, side by side — suite: turntax-airline (14 calls)
  same tool calls, two agents — count the wasted model round-trips

  tuned SOTA agent (2026)               the tool call           fak (1-shot kernel)
  ────────────────────────────────────  ──────────────────────  ──────────────────────────────
  ! would run it (safety)               fetch_policy            # blocked (see guarddemo)
  . ran                                 get_user_details        . ran
  . ran                                 search_direct_flight    . ran
  . elided (optional call)              calculate               # 1-shot — served locally
  . elided (optional call)              list_all_airports       # 1-shot — served locally
  x +1 round-trip — bad arg             convert_currency        # 1-shot — repaired in-syscall
  x +1 round-trip — dup read            get_user_details        # 1-shot — served from cache
  x +1 round-trip — dup read            search_direct_flight    # 1-shot — served from cache
  x +1 round-trip — bad arg             convert_currency        # 1-shot — repaired in-syscall
  . elided (optional call)              calculate               # 1-shot — served locally
  . elided (optional call)              list_all_airports       # 1-shot — served locally
  x +1 round-trip — dup read            get_user_details        # 1-shot — served from cache
  ! would run it (safety)               delete_account          # blocked (see guarddemo)
  . ran                                 book_flight             . ran
  ────────────────────────────────────  ──────────────────────  ──────────────────────────────
  tuned SOTA agent: 5 forced round-trips                        fak: 0 extra round-trips
  vs even a TUNED 2026 agent, fak deletes 5 forced round-trips ≈ 7.5s and $0.0270 at hosted-flash rates (1.5s/turn).
reuse go run ./cmd/ctxdemo -bars
  fak · context reuse, side by side
  prefill tokens the model must RE-READ per session — lower is better (decode excluded)

  deep-research  (C=4 agents · T=5 turns · P=1536 prefix · maxCtx=2,642)
    tuned warm-cache (SOTA)     ██████████████████████████████████████████  9,358
    fak (cross-agent reuse)     █████████████████████                       4,750
    → fak makes the model re-read 2.0× fewer tokens than even a tuned warm-cache stack.

Play the terminal comparison set with one command — then it verifies each headline still holds: bash tools/run_comparison_demos.sh

The mechanism behind each: watch a poisoned turn vanish, bit-for-bit · why in-process makes fail-closed free · why the win grows with turns, agents, and tool calls.

What you're hitting. A single GCE VM (NVIDIA L4) running the hosted turn-tax, context-reuse, reuse-race, and fak serve kernel gateways. The model demos run SmolLM2-135M in-process through the kernel. The demo host is plain HTTP, so your browser opens it in a new tab rather than embedding it here. There's also a live demos hub on the same host with the CPU-vs-GPU engine comparison, a chat surface, and the kernel's metrics.

▶ Run your own copy. Every demo is in the public repo and runs anywhere Go runs — no infrastructure of ours required. Start with the lowest-common-denominator set, then pick a dedicated track in the run guide:
git clone https://github.com/anthony-chaudhary/fak && cd fak go run ./cmd/dropindemo -print go run ./cmd/guarddemo # → http://127.0.0.1:8151 (or -print for an instant terminal diff) go run ./cmd/turntaxdemo # → http://127.0.0.1:8150 go run ./cmd/dropindemo # → http://127.0.0.1:8154 go run ./cmd/tokendemo -print go run ./cmd/unseedemo # → http://127.0.0.1:8156 go run ./cmd/timewolfdemo # → http://127.0.0.1:8155 go run ./cmd/trychatdemo # → http://127.0.0.1:8157 go run ./cmd/ctxdemo # → http://127.0.0.1:8153 go run ./cmd/demorace # → http://127.0.0.1:8147 go run ./cmd/deletioncert -selfcheck go run ./cmd/causalbench -selfcheck
The model-backed browser demos add one step — scripts/fetch-model.sh exports a small CPU model — then go run ./cmd/ctxdemo / go run ./cmd/demorace light up the live race. Security, research/science, adoption, and memory/serving demos are grouped in the guide.