AI Engineer World's Fair 2026

We Cut 94% of Our AI Coding Tokens
With a Local Code Index

The architecture behind Code Context Engine, and why input tokens are the real cost driver.

94%
Token Reduction
0.4ms
Search Latency
0.90
Recall@10
The Problem

Your AI agent reads 45,000 tokens
to answer a question that needs 800

payments.py
12,000 tokens
full file
shipping.py
9,000 tokens
full file
models.py
18,000 tokens
full file
tests.py
6,000 tokens
full file
With CCE
800
2 chunks
The Insight

Most tools optimize the wrong side

90%
input tokens
Input tokens (90%)
Output tokens (10%)

Output compression

Saves 75% of output tokens

= ~8% off total bill

Input retrieval (CCE)

Saves 94% of input tokens

= ~61% off total bill

Architecture

Five-stage compression pipeline

Tree-sitter
Chunking
AST-aware splits
10 langs
Hybrid
Retrieval
Vector + BM25 + RRF
94%
Chunk
Compression
Signatures + docs
89%
Code
Graph
CALLS · IMPORTS
related
Output
Compression
Grammar rules
25-75%

Everything runs locally. No cloud, no API calls. Three SQLite files per project.

Deep Dive

Hybrid retrieval: why not just vector search?

🎯

Vector Search

Semantic similarity via bge-small-en-v1.5 (384d). Finds conceptually related code even with different naming.

cosine similarity

🔤

FTS5 (BM25)

Exact keyword matching via SQLite FTS5. Catches function names, class names, identifiers that vector search fuzzes over.

term frequency

RRF Fusion

Reciprocal Rank Fusion (k=60) merges both ranked lists. Confidence scorer blends similarity (50%), keywords (30%), recency (20%).

1/(k + rank)

Vector alone: 0.78 recall. BM25 alone: 0.72 recall. Hybrid: 0.90 recall.

Benchmark

FastAPI: 53 files, 20 real questions

Full file baseline 83,681 tok/query
After retrieval 4,927 tok/query
After compression 523 tok/query
Recall@10 0.90
94%
retrieval savings

No cherry-picking. No synthetic queries.
Fully reproducible.

$ python benchmarks/run_benchmark.py \ --repo fastapi/fastapi --source-dir fastapi
Multi-Agent

One index. Every agent.

🟠

Claude Code

.mcp.json
CLAUDE.md
5 hooks

🔵

VS Code / Copilot

.vscode/mcp.json
copilot-instructions.md

Cursor

.cursor/mcp.json
.cursorrules

🟢

Codex CLI

~/.codex/config.toml
AGENTS.md

🔷

Gemini CLI

🟣

Tabnine

🟩

OpenCode

Cross-agent memory: decisions made in Claude Code surface in Codex. One cce init configures everything.

Memory

Your agent remembers last week

## CCE memory · resuming my-project **Previous session** (2026-06-14): Refactored auth: JWT with RS256, refresh tokens rotate on use. **Recent decisions:** - Use JWT with RS256 (mesh issues keys) - Risk limit at 2% per trade (Kelly) - PostgreSQL for primary store (ACID) Call session_recall("topic") to find more
📝

record_decision

Save architectural choices with reasoning. Surfaces automatically at session start.

🔍

session_recall

Semantic search over past decisions. Vector + FTS hybrid, same as code search.

📊

session_timeline

Walk through a past session turn by turn. Drill into specific tool calls.

Real Numbers

Per-bucket savings tracking

my-project · 247 queries · last query 5m ago ⛁ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ 88% tokens saved Input savings 12.4M tokens $186.00 Output savings 48.2k tokens $3.62 ────────────────────────────────────────── Total saved 12.4M tokens $189.62 Breakdown: retrieval 84% ▰▰▰▰▰▰▰▰▰▰ 10.4M $156.00 chunk compression 3% ▰▱▱▱▱▱▱▱▱▱ 421.5k $6.32 output compress* <1% ▰▱▱▱▱▱▱▱▱▱ 48.2k $3.62

7 buckets tracked: retrieval, chunk compression, output compression, memory recall, grammar, turn summarization, progressive disclosure

Try It

One command. Zero config.

$ uvx --from "code-context-engine[local]" cce init
30s

Install + index

0

Cloud dependencies

9

MCP tools

Python 3.11+ · macOS · Linux · Windows
MIT licensed · 170+ stars · 2,300+ monthly installs

Stop paying for tokens
your agent doesn't need

94%
fewer input tokens
$0
cloud cost
7
agents supported
$ uvx --from "code-context-engine[local]" cce init

github.com/elara-labs/code-context-engine

Free · Open Source · MIT License