# Bug Hunter — Full Reference

> AI-powered adversarial bug finding that argues with itself to surface real vulnerabilities — and auto-fixes them safely.

## Overview

Bug Hunter is an automated adversarial code auditing skill for AI coding agents. Instead of a single AI scanning code and flooding you with false alarms, it runs an adversarial multi-agent pipeline: one agent hunts for bugs, a second agent tries to disprove them, and a third agent delivers an independent final verdict.

## Skills-First Architecture

All pipeline agents are bundled as first-class skills with frontmatter under `skills/`. The orchestrator (`SKILL.md`) reads agent skills at each phase:

```
Recon (skills/recon/)
  → Hunter (skills/hunter/) + doc-lookup (skills/doc-lookup/)
    → Skeptic (skills/skeptic/) + doc-lookup
      → Referee (skills/referee/)
        → Fix Strategy + Fix Plan
          → Fixer (skills/fixer/) + doc-lookup
```

### Core Agent Skills

| Skill | Lines | Purpose |
|-------|-------|---------|
| `skills/hunter/SKILL.md` | 172 | Deep behavioral code analysis — logic errors, security vulns, race conditions |
| `skills/skeptic/SKILL.md` | 153 | Adversarial reviewer — challenges every finding, kills false positives |
| `skills/referee/SKILL.md` | 143 | Independent arbiter — delivers verdicts with CVSS scoring and PoC |
| `skills/fixer/SKILL.md` | 124 | Surgical code repair — minimal fixes respecting strategy classifications |
| `skills/recon/SKILL.md` | 166 | Codebase reconnaissance — maps architecture, trust boundaries, risk priorities |
| `skills/doc-lookup/SKILL.md` | 51 | Documentation access — Context Hub (chub) primary, Context7 fallback |

### Security Skills

| Skill | Purpose | Trigger |
|-------|---------|---------|
| `skills/commit-security-scan/SKILL.md` | Diff-scoped PR/commit security review | `--pr-security` |
| `skills/security-review/SKILL.md` | Full security workflow (threat model + code + deps + validation) | `--security-review` |
| `skills/threat-model-generation/SKILL.md` | STRIDE threat model bootstrap/refresh | `--threat-model` |
| `skills/vulnerability-validation/SKILL.md` | Exploitability/reachability/CVSS/PoC validation | `--validate-security` |

## Pipeline Architecture

### Phase 1: Triage (0 tokens, <2s)
`scripts/triage.cjs` classifies every file by risk tier (CRITICAL/HIGH/MEDIUM/LOW/SKIP) using filename patterns, directory structure, and file size heuristics. Outputs `.bug-hunter/triage.json`.

### Phase 2: Recon
`skills/recon/SKILL.md` — maps the tech stack, auth mechanisms, database layer, and key dependencies. Produces a risk map.

### Phase 3: Hunt
`skills/hunter/SKILL.md` — scans files in risk-map order (CRITICAL → HIGH → MEDIUM). Reports behavioral bugs with exact code evidence, runtime triggers, and cross-file references. Verifies claims against official documentation via `scripts/doc-lookup.cjs`.

### Phase 4: Skeptic Challenge
`skills/skeptic/SKILL.md` — re-reads actual source code for every finding and attempts to disprove it. Applies 15 hard exclusion rules. Uses risk calculations (EV = confidence × points - (1-confidence) × 2×points) to decide ACCEPT vs DISPROVE.

### Phase 5: Referee Verdict
`skills/referee/SKILL.md` — reads original code, Hunter findings, and Skeptic challenges. Delivers final verdicts. Enriches security findings with STRIDE, CWE, CVSS 3.1, reachability analysis, and proof-of-concept blocks.

### Phase 6: Fix Strategy + Plan
`buildFixStrategy()` classifies confirmed bugs into: `safe-autofix`, `manual-review`, `larger-refactor`, or `architectural-remediation`. `buildFixPlan()` gates execution by `autofixEligible` flag, splits into canary/rollout batches.

### Phase 7: Fix (optional)
`skills/fixer/SKILL.md` — surgical auto-fix with git branching, worktree isolation, test baseline capture, canary rollout, per-bug commits, automatic rollback on test regression, and post-fix re-scan.

## Documentation Verification

Bug Hunter verifies claims against official library documentation before any agent asserts framework behavior. Uses a hybrid approach:

1. **Context Hub (chub)** — curated, versioned, annotatable docs (primary)
2. **Context7 API** — broad coverage fallback when chub doesn't have the library

```bash
# Primary: doc-lookup.cjs (tries chub first, falls back to Context7)
node scripts/doc-lookup.cjs search "express" "middleware error handling"
node scripts/doc-lookup.cjs get "prisma/orm" "parameterized queries" --lang js

# Fallback: context7-api.cjs (Context7 only)
node scripts/context7-api.cjs search "express" "middleware"
node scripts/context7-api.cjs context "/expressjs/express" "error handling"
```

## Security Classification

### STRIDE Threat Categories
Spoofing, Tampering, Repudiation, Information Disclosure, Denial of Service, Elevation of Privilege

### CWE Mapping
SQL Injection (CWE-89), Command Injection (CWE-78), XSS (CWE-79), Path Traversal (CWE-22), IDOR (CWE-639), Missing Auth (CWE-306/862), Hardcoded Credentials (CWE-798), SSRF (CWE-918), and more.

### CVSS 3.1 Scoring
Critical and High security findings receive CVSS 3.1 base scores with attack vector, complexity, privileges required, and impact metrics.

## CLI Flags

| Flag | Description |
|------|-------------|
| (default) | Scan + auto-fix confirmed bugs |
| `--scan-only` / `--review` | Report only, no code changes |
| `--fix` | Explicitly enable auto-fix |
| `--fix --approve` / `--safe` | Fix with human approval per edit |
| `--plan-only` / `--plan` | Build strategy + plan, stop before edits |
| `--dry-run` / `--preview` | Preview fixes as unified diffs |
| `-b <branch>` | Scan files changed vs branch |
| `--staged` | Scan git-staged files |
| `--pr [current\|recent\|N]` | PR review workflow |
| `--pr-security` | PR security review (threat model + CVEs) |
| `--security-review` | Enterprise security audit |
| `--threat-model` | Generate STRIDE threat model |
| `--validate-security` | Force vulnerability validation |
| `--deps` | Include dependency CVE scan |
| `--autonomous` | No-intervention auto-fix run |
| `--no-loop` | Single pass (disable iterative coverage) |
| `--max-iterations <n>` | Hard cap on loop iterations (default: 10) |

## Output Files

| File | Description |
|------|-------------|
| `.bug-hunter/triage.json` | File classification and scan strategy |
| `.bug-hunter/recon.md` | Tech stack and risk map |
| `.bug-hunter/findings.json` | Canonical Hunter findings |
| `.bug-hunter/skeptic.json` | Canonical Skeptic challenges |
| `.bug-hunter/referee.json` | Canonical Referee verdicts |
| `.bug-hunter/report.md` | Final human-readable report |
| `.bug-hunter/fix-strategy.json` | Fix classification (safe-autofix/manual-review/etc) |
| `.bug-hunter/fix-strategy.md` | Human-readable strategy |
| `.bug-hunter/fix-plan.json` | Executable fix plan (canary/rollout) |
| `.bug-hunter/fix-report.json` | Fix results |
| `.bug-hunter/coverage.json` | Loop coverage state |
| `.bug-hunter/coverage.md` | Coverage summary |
| `.bug-hunter/experiment.jsonl` | Experiment loop log (append-only, metric tracking) |
| `.bug-hunter/threat-model.md` | STRIDE threat model |
| `.bug-hunter/dep-findings.json` | Dependency CVE results |

## Supported Languages

JavaScript, TypeScript, Python, Go, Rust, Java, C#, Ruby, PHP, Swift, Kotlin, C/C++

## Supported Frameworks

Express, Fastify, Next.js, Django, Flask, FastAPI, Gin, Echo, Actix, Spring Boot, Rails, Laravel — and any framework with docs in Context Hub or Context7.

## Project Structure

```
bug-hunter/
├── SKILL.md               # Orchestrator — routes to skills/
├── README.md              # Full documentation
├── CHANGELOG.md           # Version history
├── package.json           # npm package (@codexstar/bug-hunter)
├── bin/bug-hunter         # CLI entry point
├── skills/                # All agent skills (core + security)
│   ├── hunter/            # Bug finding
│   ├── skeptic/           # False positive elimination
│   ├── referee/           # Final verdicts
│   ├── fixer/             # Surgical fixes
│   ├── recon/             # Codebase mapping
│   ├── doc-lookup/        # Documentation verification
│   ├── commit-security-scan/
│   ├── security-review/
│   ├── threat-model-generation/
│   └── vulnerability-validation/
├── modes/                 # Execution strategies by codebase size
├── prompts/examples/      # Calibration examples for Hunter/Skeptic
├── schemas/               # JSON Schema contracts for all artifacts
├── scripts/               # Node.js helpers (zero AI tokens)
│   ├── run-bug-hunter.cjs    # Main orchestrator script
│   ├── experiment-loop.cjs   # Autonomous experiment loop with metrics + stop-file
│   ├── doc-lookup.cjs        # Context Hub + Context7 doc lookup
│   ├── context7-api.cjs      # Context7 standalone fallback
│   ├── prepublish-guard.cjs  # Publish safety net
│   └── tests/                # Test suite (113 tests)
├── templates/             # Subagent launch template
└── test-fixture/          # 6 planted bugs for validation
```

## Install

```bash
npm install -g @codexstar/bug-hunter && bug-hunter install
```

Requirements: Node.js 18+ recommended (core pipeline works without it). Works with Claude Code, Cursor, Codex CLI, Windsurf, Kiro, Copilot, Opencode, Pi — or any AI agent that can read files and run shell commands.

## License

MIT
