AI Engineer World's Fair 2026
We Cut 94% of Our AI Coding Tokens
With a Local Code Index
The architecture behind Code Context Engine, and why input tokens are the real cost driver.
The Problem
Your AI agent reads 45,000 tokens
to answer a question that needs 800
The Insight
Most tools optimize the wrong side
Output compression
Saves 75% of output tokens
= ~8% off total bill
Input retrieval (CCE)
Saves 94% of input tokens
= ~61% off total bill
Architecture
Five-stage compression pipeline
Tree-sitter
Chunking
AST-aware splits
10 langs
→
Hybrid
Retrieval
Vector + BM25 + RRF
94%
→
Chunk
Compression
Signatures + docs
89%
→
Code
Graph
CALLS · IMPORTS
related
→
Output
Compression
Grammar rules
25-75%
Everything runs locally. No cloud, no API calls. Three SQLite files per project.
Deep Dive
Hybrid retrieval: why not just vector search?
🎯
Vector Search
Semantic similarity via bge-small-en-v1.5 (384d). Finds conceptually related code even with different naming.
cosine similarity
🔤
FTS5 (BM25)
Exact keyword matching via SQLite FTS5. Catches function names, class names, identifiers that vector search fuzzes over.
term frequency
⚡
RRF Fusion
Reciprocal Rank Fusion (k=60) merges both ranked lists. Confidence scorer blends similarity (50%), keywords (30%), recency (20%).
1/(k + rank)
Vector alone: 0.78 recall. BM25 alone: 0.72 recall. Hybrid: 0.90 recall.
Benchmark
FastAPI: 53 files, 20 real questions
Full file baseline
83,681 tok/query
After retrieval
4,927 tok/query
After compression
523 tok/query
94%
retrieval savings
No cherry-picking. No synthetic queries.
Fully reproducible.
$ python benchmarks/run_benchmark.py \
--repo fastapi/fastapi --source-dir fastapi
Multi-Agent
One index. Every agent.
🟠
Claude Code
.mcp.json
CLAUDE.md
5 hooks
🔵
VS Code / Copilot
.vscode/mcp.json
copilot-instructions.md
⚫
Cursor
.cursor/mcp.json
.cursorrules
🟢
Codex CLI
~/.codex/config.toml
AGENTS.md
Cross-agent memory: decisions made in Claude Code surface in Codex. One cce init configures everything.
Memory
Your agent remembers last week
**Previous session** (2026-06-14):
Refactored auth: JWT with RS256,
refresh tokens rotate on use.
**Recent decisions:**
- Use JWT with RS256 (mesh issues keys)
- Risk limit at 2% per trade (Kelly)
- PostgreSQL for primary store (ACID)
📝
record_decision
Save architectural choices with reasoning. Surfaces automatically at session start.
🔍
session_recall
Semantic search over past decisions. Vector + FTS hybrid, same as code search.
📊
session_timeline
Walk through a past session turn by turn. Drill into specific tool calls.
Real Numbers
Per-bucket savings tracking
my-project · 247 queries · last query 5m ago
⛁ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ 88% tokens saved
Input savings 12.4M tokens $186.00
Output savings 48.2k tokens $3.62
──────────────────────────────────────────
Total saved 12.4M tokens $189.62
Breakdown:
retrieval 84% ▰▰▰▰▰▰▰▰▰▰ 10.4M $156.00
chunk compression 3% ▰▱▱▱▱▱▱▱▱▱ 421.5k $6.32
output compress* <1% ▰▱▱▱▱▱▱▱▱▱ 48.2k $3.62
7 buckets tracked: retrieval, chunk compression, output compression, memory recall, grammar, turn summarization, progressive disclosure
Try It
One command. Zero config.
$ uvx --from "code-context-engine[local]" cce init
Python 3.11+ · macOS · Linux · Windows
MIT licensed · 170+ stars · 2,300+ monthly installs
Stop paying for tokens
your agent doesn't need
$ uvx --from "code-context-engine[local]" cce init
github.com/elara-labs/code-context-engine
Free · Open Source · MIT License