Same model · same tokens · same answers. The headline race is fak vs a tuned warm-cache baseline — the real serving baseline a stack like vLLM / SGLang prefix-caching or provider prompt-caching gives you (prefix cached once per agent, only new tokens ingested). fak adds cross-agent prefix sharing + batched decode on top.
① Live race — fak vs a tuned warm-cache baseline SOTA serving baseline · both run live, same model
② Reuse curve across the model ladder fak + tuned warm-cache both LIVE
same workload, smaller P=128 for tractability on CPU. As the model grows, the absolute minutes saved over the tuned baseline grow with it — the ratio holds.
fak (reuse)tuned warm cache (SOTA)
Each rung: fak and the tuned warm-cache baseline both run live — the headline ratio above each pair is fak vs tuned.
fak in-kernel engine · pure-Go Q8 forward pass · tokens are real model output (anchor-quality on the 135M reference, chat-quality on the Qwen2.5 rungs). No network, no API, all on this box.