v4.3 · 97.9% on tsbench + ~20K tokens/week real

The only MCP server
that turns Claude into
a perfect coder.

v4.0 release. One MCP server, one profile. Set TOKEN_SAVIOR_PROFILE=optimized and you ship 97.9% accuracy on 96 real coding tasks. Plain Claude scores 78.3%. Active tokens drop 80% (3 395 vs 17 221). Wall time drops 83%. No tuning required.

Pick the two agents to compare
Aplain agent
Read · Grep · Bash
vs
B+ token savior
+ 59 structural MCP tools
faster wall time
fewer active tokens
higher accuracy
losses · 96 tasks
the scorecard · what the numbers say

Same model. Same tasks.
One of them kept winning.

Two agents, one codebase, ninety prompts pulled from real engineering work — find callers, audit cycles, estimate blast radius, summarize modules. We counted tokens, wall time and correctness.

agent a · plain
Read, Grep, Bash.
no structural tools
active tokens
wall time
accuracy
score
vs
vs
accuracy i
i
agent b · token savior
Plus the graph.
+ 59 structural MCP tools (lean profile)
active tokens
wall time
accuracy
score

Accuracy, by category

avg score out of 2 · red = plain · green = token savior

Where the gap opens

context ingested per category · lower is better
−99 %
Import cycle detection
Plain agent ran 35 Bash, 20 Read calls, 113k chars, 132s. Token Savior: two MCP calls, 200 chars, 17s. Same correct answer.
0 → 2
File dependency recovery
Baseline read the whole file and missed three imports. One get_file_dependencies() call got a perfect 2/2.
21
Impossible without
Tasks the plain agent scored 0 on but Token Savior solved — community detection, hotspot audits, semantic duplicates. Amber tiles below.
the matchup · task by task

Every task.
Side by side.

Every task from the benchmark. Pick any base model on the left, any Token-Savior-wired model on the right. The numbers recompute live. Token Savior wins or ties on every single one. Click a row to jump the replay below.

# category task A B Δ tokens Δ wall Δ chars outcome
the graph · what Token Savior sees

A codebase isn't a folder.
It's a graph.

What you see below is the tsbench fixture parsed into its call graph: 206 symbols (functions, classes, methods, constants), 412 call edges connecting them, clustered by module. The plain agent walks this one grep at a time; Token Savior queries it directly. Drag to orbit, hover a point for details.

function class method const 206 symbols 412 call edges drag · scroll · hover
the replay · watch a task resolve

Don't take our word for it.
Watch both agents work.

Every tile is a real task from the benchmark, colored by outcome. Click one and we'll replay both agents' actual traces side by side — real tools, real timings, real answers.

Token Savior wins () impossible without TS () tie () losses ()
TASK-026 Detect import cycles in the project. Any circular dependencies? If yes, list them.
wall time i
plain
ts
active tokens i
plain
ts
context chars i
plain
ts
score i
plain
ts
agent a · plain
Read · Grep · Bash
agent b · token savior
+ structural MCP
 works with your stack  

One MCP server.
Every coding agent.

Token Savior speaks the standard Model Context Protocol. Drop it into any client, keep your tools, ship a sharper agent.

Claude Code Cursor Codex CLI Antigravity Cline Continue Windsurf Aider Gemini CLI Copilot CLI Zed any MCP client
via the open Model Context Protocol · drop-in hooks for Codex · Cursor · Gemini
beyond the bench · real coding sessions

~20K tokens / week
saved on real sessions.

tsbench measures coding accuracy on a synthetic fixture. The new v4.1 / v4.2 / v4.3 layer measures something different: how many tokens leak out of actual tool outputs across a week of live work. Bash chatter, test runners, kubectl dumps, git logs. All of it sandboxed or compacted before it ever reaches the model context.

1 121
bash outputs scanned (7 d)
19.3%
match rate (was 11.9%)
68.9%
mean compaction on hit
~20 410
tokens saved over 7 d

What shipped since v4.0

three additive releases, zero prompt change
  • v4.1 : 14 Bash output compactors, a PreToolUse rewriter that rewrites bare commands into denser variants, and ts_discover to scan transcripts for missed TS chains.
  • v4.2 : 8 more compactors (jest, vitest, eslint, biome, kubectl, aws, npm/pip list, curl), hybrid sandbox + compact dual-mode, and a ts init CLI that wires the hooks into Claude / Cursor / Gemini / Codex.
  • v4.3 : 12 more compactors (grep, find, cat, git extras, gh extras, python3 -m pytest) and a compound command splitter that recognizes cd X && cmd.

Why it matters

free wins on top of the bench number
  • Projected ~85K tokens / month per active coder at current usage, at zero accuracy cost.
  • All gains arrive without any model-side change : no system prompt edit, no agent rewrite, no profile flip required.
  • The match rate went from 11.9% on v4.2 to 19.3% on v4.3 after one bench-driven coverage push : scripts/bench_compactors_real.py over your own transcripts to reproduce.
  • Compactors are pure functions, opt-in, fail-safe : unknown command shapes fall through to the existing sandbox path untouched.
git11
status, diff, log, push, pull, commit, add, fetch, checkout, branch, worktree list, stash list
gh6
run list, run view, pr view, pr diff, repo view, issue view
test runners4
pytest, cargo test, jest, vitest
build & lint4
cargo build, tsc, eslint, biome
shell utils3
grep, find, cat
cloud (aws + kubectl)9
kubectl get, kubectl logs, aws sts, ec2, lambda, logs, iam, dynamodb, s3
misc4
docker ps, docker logs, npm/pip list, curl
# 1. upgrade and wire the hooks in one shot pip install --upgrade token-savior-recall ts init --agent claude --yes # 2. flip the two opt-in flags export TS_BASH_COMPACT=1 export TS_BASH_REWRITE=1
reproduce the numbers on your own transcripts : scripts/bench_compactors_real.py

Stop guessing.
Give your agent the map.

Token Savior is an open MCP server. Point it at your repo, wire it into Claude Code, Cursor, Codex, Antigravity, Cline, Continue, Windsurf, Aider, Gemini CLI — any MCP-compatible client. Ship a better agent this afternoon.