# SigMap — Complete LLM Reference

> The deterministic, verifiable grounding layer for AI code work.
> A reproducible signature-and-evidence map that agents, CI, and reviewers can trust and audit. No embeddings, no vector DB, fully offline.

SigMap is the deterministic, verifiable grounding layer for AI code work. It
extracts function and class signatures from a codebase and builds a byte-stable
signature-and-evidence map that agents, CI, and reviewers can trust and audit —
proving which files and symbols are real before acting. Deterministic TF-IDF
ranking keeps the relevant context in scope (cutting tokens ~97% as a side
effect), with no LLM calls, embeddings, or vector database. Works with Claude,
Cursor, GitHub Copilot, Aider, Windsurf, local LLMs, and MCP.

# Version: 8.7.1 | Benchmark: sigmap-v8.7-main (2026-07-05)
# Source: auto-generated from package.json, version.json, benchmarks/latest.json, src/mcp/tools.js, src/config/defaults.js
# Regenerate: npm run generate:llms   |   Validate: npm run validate:llms

---

## Core metrics (benchmark: sigmap-v8.7-main, 2026-07-05)

| Metric | Without SigMap | With SigMap |
|--------|----------------|-------------|
| Retrieval hit@5 | 13.6% (random) | 86.7% (6.4× lift) |
| Token reduction | — | 97.0% average |
| Task success proxy | 10% | 67.8% |
| Prompts per task | 2.84 | 1.46 (48.8% fewer) |
| Supported languages | — | 33 |
| MCP tools | — | 18 |
| npm runtime dependencies | — | 0 |

---

## Installation

```bash
# Run immediately without installing
npx sigmap

# Install globally
npm install -g sigmap

# Auto-wire MCP + editor config + git hook + watcher
npx sigmap --setup
```

---

## CLI commands — complete reference

Every command and flag (`sigmap --help`):

```
sigmap                                   Generate context once and exit
sigmap --monorepo                        Generate per-package context (monorepo)
sigmap --each                            Run for every repo in the current directory
sigmap --routing                         Include model routing hints in output
sigmap --format cache                    Also write Anthropic prompt-cache JSON
sigmap --track                           Append run metrics to .context/usage.ndjson
sigmap --watch                           Generate + watch for file changes
sigmap --setup                           Generate + install git hook + watch
sigmap --mcp                             Start MCP server on stdio
sigmap --report                          Token reduction stats to stdout
sigmap --report --json                   Token report as JSON (for CI; exits 1 if over budget)
sigmap --report --history                Print usage log summary from .context/usage.ndjson
sigmap --report --history --chart        Include inline SVG charts + Unicode sparklines
sigmap --dashboard                       Write benchmarks/reports/dashboard.html
sigmap --suggest-tool "<task>"           Recommend model tier for a task description
sigmap --suggest-tool "<task>" --json    Machine-readable tier recommendation
sigmap --health                          Print composite health score
sigmap --health --json                   Machine-readable health score
sigmap gain                              Token-savings dashboard (totals + by-operation)
sigmap gain --all                        Add daily / weekly / monthly trend tables
sigmap gain --json                       Aggregate savings as JSON
sigmap gain --since 7d                    Window filter (7d, 30d, 12h, or ISO date)
sigmap gain --top <n> | --model <name>   Limit rows / set $ pricing model
sigmap gain --reset                      Clear the local savings log (.context/gain.ndjson)
sigmap ... --no-track                    Disable gain savings capture for this run
sigmap --diff                            Generate context for git-changed files only
sigmap --diff <base-ref>                 Generate context + structural diff vs base ref (e.g. main)
sigmap --diff --staged                   Generate context for staged files only
sigmap --benchmark                       Run retrieval benchmark (benchmarks/tasks/retrieval.jsonl)
sigmap --adapter <name>                  Generate for a specific adapter only (v3.0+)
sigmap --adapter <name> --json           Show adapter output path as JSON
sigmap --benchmark --json                Benchmark results as JSON
sigmap --eval                            Alias for --benchmark
sigmap --analyze                         Per-file breakdown: sigs, tokens, extractor, coverage
sigmap --analyze --json                  Breakdown as JSON
sigmap --analyze --slow                  Re-time each extractor; flag files >50ms
sigmap --diagnose-extractors             Run all 21 extractors vs fixtures; show pass/fail + diff
sigmap --query "<text>"                  Rank files by relevance to a query
sigmap --query "<text>" --json           Ranked results as JSON
sigmap --query "<text>" --top <n>        Limit results to top N files (default 10)
sigmap learn --good <files...>           Boost files in .context/weights.json
sigmap learn --bad <files...>            Penalize files in .context/weights.json
sigmap learn --reset                     Delete learned file weights
sigmap weights                           Show learned file multipliers
sigmap weights --json                    Learned weights as JSON
sigmap --impact <file>                   Show every file impacted by changing <file>
sigmap --impact <file> --json            Impact as JSON {changed, direct, transitive, tests, routes}
sigmap --impact <file> --depth <n>       BFS depth limit (default 3, 0=unlimited)
sigmap --callers <symbol>                Method-level blast radius — every function that (transitively) calls <symbol> (JS/TS + Python)
sigmap --callees <symbol>                Every repo function that <symbol> (transitively) calls
sigmap --callers <symbol> --json --depth <n>   Call-graph edges as JSON (depth 0 = unlimited)
sigmap verify <answer.md>                Flagship grounding guard — flag fake files/tests/imports/symbols/npm-scripts in an AI answer (alias of verify-ai-output)
sigmap verify <answer.md> --json         Grounding report as JSON (exits 1 if issues)
sigmap verify <answer.md> --report       Write a standalone HTML report (red/amber/green)
sigmap verify-ai-output <answer.md>      Full command name for gen-context verify
sigmap conventions                       Extract repo file-naming/export/test conventions (--conflicts, --inject, --report, --fix)
sigmap scaffold "<name>"                 Propose a convention-matched file/dir scaffold (--ext, --threshold, --force, --json)
sigmap verify-plan <plan.md|->           Check a plan vs the live index — files/symbols exist, blast radius, scope (--json)
sigmap review-pr                         Audit a diff — scope drift, god-node edits, missing tests, security files (--staged, --base, --json, --markdown)
sigmap review-pr --markdown              PR Evidence Report — branded Markdown (signatures + blast radius + tests) to post as a PR comment
sigmap create "<task>"                   Grounded-creation pipeline: scaffold → verify-plan → verify-ai-output → review-pr (--staged)
sigmap squeeze <file|->                  Minimize a pasted stacktrace/CI-log/JSON blob (--json for stats)
sigmap ask "<query>" --squeeze           Auto-accept input minimization (no prompt; for scripts/CI)
sigmap ask "<query>" --no-squeeze        Disable input minimization entirely
sigmap ask "<query>" --squeeze-threshold N  Min reduction %% to prompt (default 30)
sigmap evidence "<query>"                Build a deterministic Evidence Pack (JSON) → .context/evidence-pack.json
sigmap evidence "<query>" --markdown     Emit the Markdown handoff rendering to stdout
sigmap evidence "<query>" --top <n> --budget <n> --out <path>   Tune ranked files / token budget / write rendered output
sigmap note "<text>"                     Append a note to the cross-session decision log
sigmap note                              List recent notes (also: note --list <N>)
sigmap status                            Show repo state — branch, dirty files, index freshness, notes
sigmap doctor                            Diagnose config, index, freshness, coverage, MCP wiring — with fixes (--json; exits 1 on hard failure)
sigmap mcp list                          List MCP clients and their config paths (--json)
sigmap mcp install <client>              Wire MCP for one client (claude|cursor|windsurf|vscode|zed|codex|gemini|opencode|mcp); --global for user-level
sigmap --init                            Write example config + .contextignore scaffold
sigmap --help                            Show this message
sigmap --version                         Show version
```

---

## MCP server — 18 tools

Start with `sigmap --mcp` (stdio JSON-RPC). Configure once:

```json
{ "mcpServers": { "sigmap": { "command": "npx", "args": ["sigmap", "--mcp"] } } }
```

### read_context

Read extracted code signatures for the project or a specific module path. Returns the full copilot-instructions.md content (~500–4K tokens) or a filtered subset when a module path is provided (~50–500 tokens).

```
Input:  { module?: string }
```

### search_signatures

Search extracted code signatures for a keyword, function name, or class name. Returns matching signature lines with their file paths.

```
Input:  { query: string }
```

### get_map

Read a section from PROJECT_MAP.md — import graph, class hierarchy, or route table. Requires gen-project-map.js to have been run first.

```
Input:  { type: string }
```

### create_checkpoint

Create a session checkpoint summarising current project state. Returns recent git commits, active branch, token count, and a compact snapshot of the codebase context — ideal for session handoffs or periodic saves during long coding sessions.

```
Input:  { note?: string }
```

### get_routing

Get model routing hints for this project — which files belong to which complexity tier (fast/balanced/powerful) and which AI model to use for each type of task. Helps reduce API costs by 40–80% by routing simple tasks to cheaper models.

```
Input:  { } (no arguments)
```

### explain_file

Explain a specific file: returns its extracted signatures, direct imports (files it depends on), and callers (files that import it). Ideal for understanding a file in isolation without reading raw source. Requires the context file to have been generated first.

```
Input:  { path: string }
```

### list_modules

List all top-level modules (srcDirs) present in the context file, sorted by token count descending. Use this to decide which module to pass to read_context before querying a specific area of the codebase.

```
Input:  { } (no arguments)
```

### query_context

Rank and return the most relevant files for a specific task or question. Uses keyword + symbol + path scoring to surface only the top-K files relevant to the query — much cheaper than reading all context. Returns ranked file list with signatures and relevance scores.

```
Input:  { query: string, topK?: number }
```

### get_impact

Show every file that is impacted when a given file changes — direct importers, transitive importers, affected tests, and affected routes/controllers. Gives agents instant blast-radius awareness before making a change. Handles circular dependencies safely (no infinite loops).

```
Input:  { file: string, depth?: number }
```

### get_lines

Fetch an exact line range from a source file on demand — the Surgical Context workhorse. Signatures carry `path:start-end` anchors; call this to read just those lines instead of re-opening the whole file. Lines are clamped to the file bounds and secret-scanned (redacted) before return. Path is sandboxed to the project root.

```
Input:  { file: string, start: number, end: number }
```

### read_memory

Recall the project decision log — recent notes left by humans or agents across sessions (via `sigmap note`), plus the last ranking-session focus. Call this at the start of a task to kill cold-start: it answers "what were we doing and why" without re-reading the whole codebase.

```
Input:  { limit?: number }
```

### get_callee_signatures

Return the EXACT current signature(s) of named symbols (functions, classes, methods) from the index — so an agent never guesses a callee's parameter types from training memory. Call this before writing code that uses a symbol. Unknown names get a closest-match suggestion.

```
Input:  { symbols: array }
```

### sigmap_notify_file_created

Tell SigMap a file was created or modified so its signatures are indexed live for the rest of the session. Call this after writing a file — the new symbols become resolvable by search_signatures / get_callee_signatures.

```
Input:  { path: string, content?: string }
```

### sigmap_notify_symbol_added

Fast path: register a single new symbol signature directly in the live index without re-reading the whole file.

```
Input:  { signature: string, file: string, line?: number }
```

### sigmap_notify_file_deleted

Tell SigMap a file was deleted so its symbols are dropped from the live index.

```
Input:  { path: string }
```

### get_diff_context

For every changed file in the working tree (or staged, or vs a base ref), return its current signatures plus blast radius — direct importers, transitive count, and affected tests/routes — with a risk label. One call gives an agent everything a code review or a safe edit needs. Lists changed files shell-free (git binary, never a shell).

```
Input:  { base?: string, staged?: boolean, depth?: number }
```

### get_architecture_overview

A high-level map of the codebase in one call: module breakdown (files/tokens), the most depended-on "hub" files, the dependency-cycle count, and route totals. Extends get_map — use it to orient in an unfamiliar repo before drilling in with read_context / query_context.

```
Input:  { } (no arguments)
```

### verify_suggestion

Ground an AI code suggestion before writing it: verify a snippet or answer against the repository AND the libraries actually installed in node_modules (the grounding moat). Flags fake file paths, unresolvable imports, symbols absent from both the repo index and the installed libraries, and non-existent npm scripts — deterministic, offline, no LLM. Reports the installed libraries it verified against with pinned versions.

```
Input:  { code: string }
```

---

## Configuration (gen-context.config.json)

Every config key and its default:

```
output = .github/copilot-instructions.md
outputs = ["copilot"]
adapters = null
srcDirs = ["src","app","lib","packages","services","api","server","client","web","frontend","backend","desktop","mobile","shared","common","core","workers","functions","lambda","cmd","pages","components","hooks","routes","controllers","models","views","resources","config","db","projects","apps","libs","instance","blueprints","src/main/java","src/main/kotlin","src/main/scala","app/src/main/java","app/src/main/kotlin","src/test/java","src/test/kotlin"]
exclude = ["node_modules",".git","dist","build","out","__pycache__",".next","coverage","target","vendor",".context","playwright-tmp","playwright-report","test-results",".turbo","storybook-static",".docusaurus"]
maxDepth = 6
maxSigsPerFile = 25
maxTokens = 6000
autoMaxTokens = true
coverageTarget = 0.8
modelContextLimit = 128000
maxTokensHeadroom = 0.2
secretScan = true
monorepo = false
diffPriority = true
strategy = full
hotCommits = 10
watchDebounce = 300
routing = false
format = default
tracking = false
mcp = {"autoRegister":true}
depMap = true
versionPins = true
todos = true
changes = true
changesCommits = 10
testCoverage = false
testDirs = ["tests","test","__tests__","spec"]
sigCache = false
impactRadius = false
retrieval = {"topK":10,"recencyBoost":1.5}
impact = {"depth":3,"includeSigs":true}
```

---

## Supported languages (33 extractors)

cpp, csharp, css, dart, dockerfile, gdscript, go, graphql, html, java, javascript, kotlin, markdown, php, properties, protobuf, python, r, ruby, rust, scala, shell, sql, svelte, swift, terraform, toml, typescript, typescript_react, vue, vue_sfc, xml, yaml

---

## Integrations

Generates native context files for: claude, codex, copilot, cursor, gemini, openai, willow, windsurf — plus an MCP server for any agent (Claude Code, Cursor, Cline, Windsurf, OpenCode, Gemini CLI, Aider). One `sigmap --setup` wires the lot.

---

## Compliance evidence support

SigMap can surface repository facts that *support* technical-evidence narratives
(e.g. DORA Art. 8–11, NIS2 Art. 21, ISO 27001 A.8) — it is a **technical evidence
pack**, never a certification or a "compliance report". Signed evidence packs are
planned for a later release. Any compliance-adjacent wording is reviewed against
the relevant regulation before publication; SigMap makes no legal claims.

---

## Project information

- Author: Manoj Mallick
- License: MIT
- Repository: https://github.com/manojmallick/sigmap
- Documentation: https://sigmap.io/
- npm: https://www.npmjs.com/package/sigmap
- Benchmark dataset: https://doi.org/10.5281/zenodo.19898842
- Issues: https://github.com/manojmallick/sigmap/issues

---

## What SigMap does not do

- **No embeddings / vector database.** Ranking is deterministic TF-IDF over
  extracted signatures — reproducible and offline, not a semantic vector search.
- **No code execution.** SigMap reads source statically; it never runs your code.
- **No network calls** on the core generate/ask/verify paths. Nothing is uploaded;
  generation works fully offline.
- **Not a linter or type checker.** It maps and ranks code structure; it does not
  judge correctness (use `verify-ai-output` only to flag *fabricated* references).
- **Not a full file reader.** It emits signatures + line anchors; an agent fetches
  exact bodies on demand via the `get_lines` MCP tool.
- **No telemetry.** Usage tracking (`--track`, `.context/usage.ndjson`) is local
  and opt-in; nothing leaves your machine.
