Start with the lowest-common-denominator fak demos: no model, no GPU, no provider key, no network. Then move into dedicated tracks for security, research/science, adoption, memory/serving, and the hosted live-model races.
The lowest-friction adoption proof. It shows the child-only environment and
base-URL wiring for Claude Code, Codex, OpenCode, and Aider without sending a
model request or needing a provider key. Run go run ./cmd/dropindemo -print
or open http://127.0.0.1:8154.
The moat, side by side. The same adversarial tool-call trace runs down two columns at once:
without fak, a poisoned tool result is admitted to context and the injected delete_account
payload executes; with fak, the poison is paged out and the destructive call is refused at the boundary —
while the legitimate calls run on both. A real kernel verdict per row, no model. Run
go run ./cmd/guarddemo -print or open http://127.0.0.1:8151.
Two lanes race in real time: a SOTA two-pass agent loop versus fak's one-shot kernel, replaying the same class-labeled tool-call trace. Every turn fak saves — a grammar repair, a vDSO cache hit, a poisoned result quarantined — ticks up visibly on one lane while the other stays flat. The safety floor sits on its own axis, never folded into the turn count.
Replay through the kernel → local · tokens · no modelA browserless accounting demo with two separate meters: a prefiltered bad call
keeps model-context tokens out, and a repeated read served from cache avoids tool
executions. Run go run ./cmd/tokendemo -print; no browser, port, model,
or network.
The KV-cache eviction witness drives the real ctxmmu gate,
kvmmu bridge, and KVCache.Evict against a synthetic
model witness. Run go run ./cmd/unseedemo -print or open
http://127.0.0.1:8156.
The children's game as an agent loop: a one-tool agent calls get_time
through the real kernel.Fold path, while a red-team variant's smuggled
delete_calendar / wipe_disk is refused at the capability floor —
inside the loop. Run go run ./cmd/timewolfdemo -print or open
http://127.0.0.1:8155.
Type a message and a tiny tool-using agent answers, but every tool call runs through
the real kernel.Fold path first. Ask for the time or the weather and it answers;
ask it to delete_account or paste a prompt injection and the destructive call is
refused at the capability floor. Run go run ./cmd/trychatdemo or open
http://127.0.0.1:8157.
The fleet thesis made visible: a shared prefix prefilled once and cloned into N agents, with a per-agent timeline showing each tool result drawn to scale as the context grows unevenly. Pick a scenario, read the exact prefill-token work each strategy does (warm KV vs fak), then run the live race — fak vs the warm-cache baseline — through the real in-kernel model.
Open the reuse proof → hosted · live model researchA head-to-head live race over one 25-request multi-agent session. The headline is fak vs a tuned warm-cache baseline — the per-agent KV / prefix-caching stack vLLM · SGLang · provider prompt-caching give you: it caches the prefix once per agent and ingests only new tokens. fak prefills the shared prefix once for the whole fleet, clones it into the agents, and batches decode. Same model, same tokens, same answers. Then build the reuse curve across the model ladder.
Run the live race →These headless demos render the same kernel verdicts in your terminal, side by side, in ~30 seconds. The specialized research and live-model demos remain available, but this set is the common floor: one command each, no weights, no GPU, no network.
go run ./cmd/guarddemo -printfak · the safety floor, side by side — scenario: guard-redteam (7 calls) same agent · same attack · same tool calls — run twice WITHOUT fak the tool call WITH fak ────────────────────────────────── ──────────────────────── ────────────────────────────────── x POISON ADMITTED to context fetch_policy # paged out (quarantined) . ran (legit) get_user_details . ran (allowed) x EXECUTED (account deleted) delete_account # REFUSED (deny-as-value) . ran (legit) search_direct_flight . ran (allowed) x EXECUTED (account deleted) delete_account # REFUSED (deny-as-value) . ran (legit) book_flight . ran (allowed) x EXECUTED (account deleted) delete_account # REFUSED (deny-as-value) ────────────────────────────────── ──────────────────────── ────────────────────────────────── WITHOUT fak: 4 breaches WITH fak: 0 breaches fak refused 3 destructive ops and paged out 1 injection — and still ran the 3 legitimate calls.
go run ./cmd/turntaxdemo -printfak · the turn tax, side by side — suite: turntax-airline (14 calls) same tool calls, two agents — count the wasted model round-trips tuned SOTA agent (2026) the tool call fak (1-shot kernel) ──────────────────────────────────── ────────────────────── ────────────────────────────── ! would run it (safety) fetch_policy # blocked (see guarddemo) . ran get_user_details . ran . ran search_direct_flight . ran . elided (optional call) calculate # 1-shot — served locally . elided (optional call) list_all_airports # 1-shot — served locally x +1 round-trip — bad arg convert_currency # 1-shot — repaired in-syscall x +1 round-trip — dup read get_user_details # 1-shot — served from cache x +1 round-trip — dup read search_direct_flight # 1-shot — served from cache x +1 round-trip — bad arg convert_currency # 1-shot — repaired in-syscall . elided (optional call) calculate # 1-shot — served locally . elided (optional call) list_all_airports # 1-shot — served locally x +1 round-trip — dup read get_user_details # 1-shot — served from cache ! would run it (safety) delete_account # blocked (see guarddemo) . ran book_flight . ran ──────────────────────────────────── ────────────────────── ────────────────────────────── tuned SOTA agent: 5 forced round-trips fak: 0 extra round-trips vs even a TUNED 2026 agent, fak deletes 5 forced round-trips ≈ 7.5s and $0.0270 at hosted-flash rates (1.5s/turn).
go run ./cmd/ctxdemo -bars fak · context reuse, side by side
prefill tokens the model must RE-READ per session — lower is better (decode excluded)
deep-research (C=4 agents · T=5 turns · P=1536 prefix · maxCtx=2,642)
tuned warm-cache (SOTA) ██████████████████████████████████████████ 9,358
fak (cross-agent reuse) █████████████████████ 4,750
→ fak makes the model re-read 2.0× fewer tokens than even a tuned warm-cache stack.
Play the terminal comparison set with one command — then it verifies each headline still holds:
bash tools/run_comparison_demos.sh
The mechanism behind each: watch a poisoned turn vanish, bit-for-bit · why in-process makes fail-closed free · why the win grows with turns, agents, and tool calls.
What you're hitting. A single GCE VM (NVIDIA L4) running the hosted turn-tax, context-reuse,
reuse-race, and fak serve kernel gateways. The model demos run SmolLM2-135M
in-process through the kernel. The demo host is plain HTTP, so your browser opens it in a new tab rather than
embedding it here. There's also a live demos hub
on the same host with the CPU-vs-GPU engine comparison, a chat surface, and the kernel's metrics.
git clone https://github.com/anthony-chaudhary/fak && cd fak
go run ./cmd/dropindemo -print
go run ./cmd/guarddemo # → http://127.0.0.1:8151 (or -print for an instant terminal diff)
go run ./cmd/turntaxdemo # → http://127.0.0.1:8150
go run ./cmd/dropindemo # → http://127.0.0.1:8154
go run ./cmd/tokendemo -print
go run ./cmd/unseedemo # → http://127.0.0.1:8156
go run ./cmd/timewolfdemo # → http://127.0.0.1:8155
go run ./cmd/trychatdemo # → http://127.0.0.1:8157
go run ./cmd/ctxdemo # → http://127.0.0.1:8153
go run ./cmd/demorace # → http://127.0.0.1:8147
go run ./cmd/deletioncert -selfcheck
go run ./cmd/causalbench -selfcheckscripts/fetch-model.sh exports a small CPU model — then
go run ./cmd/ctxdemo / go run ./cmd/demorace light up the live race. Security,
research/science, adoption, and memory/serving demos are grouped in the guide.