All notable changes to roam-code will be documented in this file.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

[Unreleased]

Perf + Pattern-1 stabilisation campaign (2026-05-21)

Performance

Fixed

Post-v13.3 execution wave (2026-05-20)

Added

Fixed

[13.3] — 2026-05-19

MCP runtime security wave (2026-05-18)

Added — 2026-05-18 session

Fixed — 2026-05-18 session

Changed — 2026-05-18 session

In flight today (uncommitted; will ride the next squash)

Added

Fixed

Changed

Closed

Added — W1103-arc + W489-family-closed + capability-invariants + structured_unknown_filter-FULLY-CLOSED + symmetric-emission-COMPLETE batch (post-CONSOLIDATE-21, 2026-05-17 /loop iteration N+22)

> ~17 SHIPPED + 3 BAIL/SHIPPED + 1 RESEARCH MEMO + 2 REAL BUGS > fixed across 7 themes: regex CLI toggle (W421), taint qualified_only > lint family FULLY CLOSED (W489-A + W489-A-followup), capability-axis > invariant lint + 2 REAL BUGS fixed (W365 family), structured_unknown_filter > family FULLY CLOSED (W1083-followup-3 multi-value sibling), > symmetric-emission family COMPLETE (W1100 + W1101 + W1102), 2 > test-rot diagnoses (W844-drive-by-2 + W1084), and pruning + W1117 > placeholder family FULLY CLOSED (W507 + W1117-followup-4). Plus 5 > stale-BACKLOG hits doc-pinned (W844 + W1007 + W1008 + W851 + W414b) > — discipline rule re-affirmed at operational cadence.

<!-- PHANTOM 2026-05-18: declared SHIPPED but memo absent from disk. Regenerate from BACKLOG/test-fixture breadcrumbs before next release. -->

Fixed — W1103-arc batch

Changed — W1103-arc batch

Added — W1067 → W1102 arc batch (post-CONSOLIDATE-20, 2026-05-17 /loop iteration N+21)

> ~30 completions consolidating the W1067 → W1102 wave-arc. Seven > themes carry the batch: Pattern-1D helper Phase 2/3 propagation > (7 callsites), W1142 cap-hit disclosure family closure (7 commands), > Pattern 3a severity widening on cmd_smells + cmd_adversarial (family > TERMINAL), W1117 placeholder normalization sweep (22 commands), > symmetric envelope emission (W1100 + W1101), W350 OSCAL > authority_refs projection (closes evidence-question Q2 coverage), > and permit-vs-lease asymmetry documented in CLAUDE.md (W1071).

Fixed — W1067 → W1102 arc batch

Changed — W1067 → W1102 arc batch

Added — W1086-arc + Wave-B-TERMINAL + W478 + Pattern-1A-family batch (post-CONSOLIDATE-19, 2026-05-17 /loop iteration N+20)

> ~20 completions since CONSOLIDATE-19 (Section 65). SARIF > dashboard family TERMINAL milestone — the W1062 + W1062-followup > trio + W1062-followup-2/3/4 fan-out + W1087 lint substitute arc > that kicked off at CONSOLIDATE-18 closes cleanly across 12 wired > emitters + 39 catalogued emitters end-to-end. Four themes carry > the batch: (a) SARIF dashboard family TERMINAL at 12 wired > emitters + W1087 lint substitute (W1062-followup-3 + > W1062-followup-4 wire 6 more emitters; W1087 catalogues 13 WIRED > + 26 EXEMPT = 39 emitters end-to-end). (b) MCP outputSchema > 13-tool Wave B TERMINAL carry-forward + Wave C1 implementation > kickoff — Wave C1 lands the first compat-profile env-vars > (ROAM_MCP_COMPAT_STRIP_OUTPUT_SCHEMA + ROAM_MCP_COMPAT_STRICT) > plus a sidecar hoist drive-by; pairs with the new > MCP-COMPAT-PROFILE-ROADMAP planning memo. (c) Pattern-2 + > Pattern-1A empty-state arc closure — 8 detectors + 2 hard-cap > commands sealed (W805-followup-bundle + W1085 + W1086 + W1084). > (d) 3 research memos drafted across the arc — MCP-COMPAT- > PROFILE-ROADMAP new this session, MCP-OUTPUTSCHEMA-EVOLUTION + > DETECTOR-FP-RATE-METHODOLOGY carry-forward. Plus 10 stand-alone > polish items (W365 + W459 + W478 + W844 + W847 + W759 + W986 + > W462 + W1088 + W1038) + 1 BAIL (W851). Parallel in stature to > the Wave B TERMINAL milestone that CONSOLIDATE-19 carried.

Fixed — W1086-arc + Wave-B-TERMINAL + W478 + Pattern-1A-family batch

Changed — W1086-arc + Wave-B-TERMINAL batch

Added — Wave-B-TERMINAL + W794 + W1028 + W805 batch (post-CONSOLIDATE-18, 2026-05-16 /loop iteration N+19)

> ~18 completions since CONSOLIDATE-18 (Section 64). Wave B > TERMINAL milestone — the W767 5-wave outputSchema roadmap that > kicked off at CONSOLIDATE-18 closes cleanly across 13 MCP tools + > ~113 envelope-validation tests. Three themes carry the batch: > (a) Wave B TERMINAL — 13 MCP tools specialized across 5 > sub-ships (Wave B2 / B3 / B4 / B5-partial / B5b TERMINAL). > (b) MCP server card SEP-2127 readiness (W794)icons[] > field across 4 .well-known path variants, carries the W792 + W793 > work to a clean SEP-2127-ready posture. (c) Pattern-2 > empty-state audit arc closure (W805) — 3 detectors migrated > (cmd_test_hermeticity + cmd_llm_smells + cmd_boundary) + 5 > followups captured + 1 real bug fixed (cmd_boundary SQL outside > with open_db block). Plus 3 stand-alone polish items > (W1061-followup-2 + W1008 carry-forward + the > DETECTOR-FP-RATE-METHODOLOGY research memo). Parallel in stature > to the W1255 architectural-decision-and-implementation arc that > CONSOLIDATE-16 carried.

Fixed — Wave-B-TERMINAL + W794 + W1028 + W805 batch

Changed — Wave-B-TERMINAL batch

Added — W1275-W1312-arc + Wave-B1 + sarif-disclosure batch (post-CONSOLIDATE-17, 2026-05-16 /loop iteration N+18)

> ~15 completions since CONSOLIDATE-17 (Section 63). Fast-follow- > through batch — 5 themes carry the dispatch tail: (a) Pattern-2c > carry-forward closures (W1275 / W1276-fix / W1277 / W1278a / W1309 > — the Pattern-2c CONSOLIDATE-16 → -17 → -18 chain closes cleanly, > Pattern-2c roster now 31/31 effective). (b) SARIF dashboard- > filtering trio (W1060 + W1061 + W1062 + 2 followups — OASIS 2.1.0 > § 3.51 + § 3.52 + properties.tags[] plumbed across cmd_smells + > cmd_check_rules + cmd_taint + cmd_vulns + cmd_health + cmd_complexity > + secrets emitter). (c) MCP outputSchema roadmap kickoff (W767 > inventory + Wave B1 specialized schemas on roam_impact + > roam_preflight + W1311 decorator normalization + W1312 redundancy > drops + the EVOLUTION research memo). (d) Pattern-1D file-substring > disclosure (W1309). (e) Pattern-3a severity widening (W1005 + > W1005-followup-A + W1007 agent_contract:[] preservation).

Fixed — W1275-W1312-arc + Wave-B1 + sarif-disclosure batch

Changed — W1275-W1312-arc batch

Added — W1284-W1308 batch (post-CONSOLIDATE-16, 2026-05-16 /loop iteration N+17)

> ~25 completions since CONSOLIDATE-16 (Section 62). Post-v13.2-release > hardening batch — first ~25 W#s after the release-merge land cleanly > without re-opening flagship arcs. Four themes carry the batch: (a) init / > cold-start UX fixes (W1288 / W1289 / W1290 / W1291). (b) SARIF advisory- > warning plumb carry-forward bundle (W1084 + W1113 + W1114 + W1115 in one > commit + W1236 chore drop of orphan emitters). (c) CGA edge-bundle > stability + post-merge CI hardening (W1285 / W1284-G3 / W1286 / W1287 + > W1297-W1302 / W1303-W1305). (d) MCP card v13.2 sync + CI infrastructure > (W1306 / W1307 / W1308 / W1088 / W1089).

Fixed — W1284-W1308 batch

Changed — CI infrastructure

[13.2] — 2026-05-16

Highlights — wip/v13.2-session-2026-05-16 ship (W1255-W1297)

The 2026-05-16 session that landed:

Added — W1255-W1278 batch (post-CONSOLIDATE-15, 2026-05-16 /loop iteration N+16)

> 7+ completions since CONSOLIDATE-15 (Section 61). The follow- > through batch after the Pattern-2c 30/30 terminal landed. The > MAJOR load-bearing milestone: the W1255 architectural > decision (Option (a) "Keep top-level + add siblings") landed AND > shipped within the same consolidation window — .roam-rules.yml > (root) + .roam/constitution.yml (existing) + .roam/control-map.yml > (new) are the canonical config paths, and src/roam/evidence/config_hashes.py > (84 LOC, NEW) + ledger.py stamping at start_run (+18 LOC) wire > the producer side end-to-end. Side benefit: vsa.py already > CONSUMES constitution_hash + rules_config_hash at lines > 281-296 — producer wire-up immediately benefits VSA attestation > with zero further code change. W1253 unblocked. Plus the > W1272 Pattern-2c unresolved-path standardization milestone: > 8 commands (cmd_impact + 6 helper-callers + cmd_preflight > already-compliant pin) now emit the canonical Convention-c > unresolved-path shape — exit code 0 on unresolved across all 8. > Five themes: (1) W1255 architectural decision recorded + IMPL > shipped — Cranot picked Option (a); config_hashes.py substrate > + ledger.py producer wire-up + CLAUDE.md doc landed inside the > same window. 11 new tests + 101 in-scope tests pass; hash-stability > preserved. (2) W1272 Pattern-2c unresolved-path standardization > SHIPPED — 8-command Convention-c bulk migration (78+105+27+51 > tests pass; zero regressions; exit-code-0-on-unresolved across all > 8). The post-Pattern-2c-terminal follow-up arc surfaced at > CONSOLIDATE-15 (W1268-audit) lands within the next consolidation > window. (3) W1273 test_validate_plan dogfood-brittleness fix > SHIPPED — 3 tests hardened (cold-start-guard bypass + > _vp_blast_radius stubbing); 27/27 tests pass. (4) Drive-by > captures — W1275 (3 remaining dogfood-brittle tests in > test_validate_plan.py) + W1276 (test_impact_auto_logs_not_found_path > RECLASSIFIED → W1272-expected-failing; in flight as W1276-fix) + > W1277 (replay-narration provenance for unresolved-path attempts > — auto_log removed from cmd_impact; signal-loss risk) + W1278 > (audit 3 remaining symbol_not_found callers — cmd_test_scaffold > / cmd_plan_refactor / cmd_guard). (5) Lockstep consolidation > discipline — the every-~8-completions /loop rule fires; the > follow-through batch lands cleanly without the multi-arc-terminal > volume of CONSOLIDATE-14 / CONSOLIDATE-15.

In flight — W1255-W1278 batch (parallel dispatches not yet on disk)

Added — W1245-W1274 batch (post-CONSOLIDATE-14, 2026-05-16 /loop iteration N+15)

> ~20 completions since CONSOLIDATE-14 (Section 60) — the largest > cumulative consolidation since the W1175-RESEARCH propagation arc > mid-points. The MAJOR milestone: Pattern-2c propagation arc > COMPLETE at 30/30 sites. The W1233-audit roster (originally 38 > sites) resolved to 30 real true-positives once the W1267-audit > filtered out two W1233-audit false positives (cmd_hotspots / > cmd_smells lacked real find_symbol callsites). Wave 1 quartet > (W1242 + W1243 + W1244 + W1248 — CONSOLIDATE-14) + cmd_annotate > (W324 origin template) + W1245 batches 1-4 covering 20 SHIP + > 2 BAIL (22 cmd_*.py visited, 20 disclosure-covered + 2 false- > positive BAILs) closed every remaining real Pattern-2c site. > Both terminal > arcs are now CLOSED: the SARIF SHIP/SKIP-disclosure 196 → 0 > propagation arc reached terminal at CONSOLIDATE-14; the Pattern-2c > 30/30 arc reaches terminal at CONSOLIDATE-15. The agentic-assurance > substrate now spans producer (W1234 evidence_stale + earlier W210 > packet substrate) + consumer (W1262 doctor/diff stale banner) + > attestation (W37x CGA + W377 permit collector) — all three axes > structurally complete. Six themes: (1) Pattern-2c bulk > completion — W1245 batches 1-4 (22 SHIP across cmd_dead + > cmd_safe_delete + cmd_closure + cmd_symbol + cmd_hover + > cmd_pytest_fixtures + cmd_plan + cmd_context + cmd_relate + > cmd_why + cmd_visualize + cmd_invariants + cmd_testmap + > cmd_affected_tests + cmd_guard + cmd_metrics + > cmd_plan_refactor + cmd_pr_bundle + cmd_safe_zones + > cmd_test_scaffold) + 2 BAIL on W1233-audit false positives > (cmd_hotspots / cmd_smells — no real find_symbol callsite). > Plus cmd_annotate origin template (W324) accounted in the > 30-tally. (2) Pattern-2c family extensions — W1250 helper > docstring (collision-pattern documented); W1270 helper reserved- > key warning (Pattern-2 silent-drop fix at substrate level; first > real-world use in W1245-batch-4 cmd_safe_zones); W1268-audit > surfaced 5-way unresolved-path divergence captured as W1272; > W1271-audit surfaced test_validate_plan dogfood-brittleness > captured as W1273; W1273-fix → W1274 stale-assertion fix in > test_visualize; W1265 docstring at vsa.py:133 (W1264 follow-up). > (3) Evidence/W210 extensions — W1262 doctor/diff stale-evidence > banner (consumer-side wire-up of W1234 evidence_stale producer); > W1266 completeness_compat shared module hoist (-180 LOC duplicate > helpers + 205 LOC shared; W1262 drive-by). (4) Per-kind version > stamps — W1256 cmd_vibe_check per-pattern version stamps (10 > patterns); W1269 cmd_smells per-kind version stamps (7 patterns > wired). (5) Audit closures — W1267 audit corrected the 34-site > Pattern-2c list to 30 real true-positives by filtering the two > W1233-audit false positives. (6) CONSOLIDATE pause — natural > stopping point after Wave 2 batch-4 lands; no in-flight dispatches > at consolidation time. (Tally arithmetic: 20 SHIP across W1245 > batches 1-4 + 4 Wave-1 from CONSOLIDATE-14 + 1 W324 cmd_annotate > origin + 5 already covered upstream / earlier = 30 real > Pattern-2c sites disclosure-covered; the W1233-audit 38-site > original count was inflated by 2 false positives + ~6 duplicates > already covered by earlier substrates.)

In flight — W1245-W1274 batch (parallel dispatches not yet on disk)

Added — W1242-W1259 batch (post-CONSOLIDATE-13, 2026-05-16 /loop iteration N+14)

> ~15 completions since CONSOLIDATE-13 (Section 59). Six themes: > (1) Pattern-2c family enablement — Wave 1 quartet landed — > W1242 cmd_impact + W1243 cmd_preflight + W1244 cmd_diagnose + > W1248 cmd_trace adopted the W1241 resolution_disclosure() helper > substrate at the find_symbol() / find_symbol_id() call sites. > The four flagship commands now surface which tier of the resolver > succeeded (symbol / file / fuzzy / unresolved) + a > partial_success flag set on any non-symbol resolution. Wave 1 > finishes the highest-traffic Pattern-2c sites first, matching the > W1192/W1195 SARIF SHIP sequencing pattern. (2) Pattern-2c > substrate refactor (W1249) — hoisted find_symbol tier-stamping > into the substrate helper, eliminating ~100 LOC of duplicate > _detect_resolution_tier helpers across the four Wave-1 flagships. > The W1249 refactor unblocks W1245-batch-1 (Wave 2 first 5 sites) at > ~3× LOC simplification per consumer — without it, each Wave-2 > adoption would carry ~25 LOC of boilerplate the substrate now > absorbs. (3) Wave 16 SKIP-disclosure landed — _KNOWN_MISSING > 17 → 0 — 17 docstrings shipped across the remaining Bucket B > long-tail (cmd_debt + cmd_entry_points + cmd_guard + cmd_map > + cmd_metrics + cmd_path_coverage + cmd_patterns + > cmd_plan_refactor + cmd_pytest_fixtures + cmd_risk + > cmd_safe_delete + cmd_safe_zones + cmd_simulate_departure + > cmd_suggest_refactoring + cmd_testmap + cmd_why_slow + > cmd_ws). The 196 → 0 propagation arc is now fully closed — > 196 commands audited, 196 commands disclosure-covered (179 > SKIP-disclosure docstrings + 17 SARIF SHIP emitters across > CONSOLIDATEs 4 → 14). The W1175-RESEARCH long-tail roster > exhausted; arc terminal. (4) Evidence/W210 wire-up — W1234 > shipped the evidence_stale producer (W210 packet-layer Pattern-2 > variant-2f); W1254 (consumer) in flight at consolidation time; W1253 > BAIL surfaced W1255 architectural prerequisite (no upstream packet > exists to mark stale → captured as architectural decision pending). > (5) State-vocab substrate (W1235)_STATE_FAMILY_ALIASES > registry landed at substrate level for state-name normalization > across closed-vocab Pattern-2g sites. (6) SARIF rule rename > (W1232)flag-constant-default rule renamed to flag-suspect > per W1226 SHIP scope-discipline follow-up, aligning the > cmd_flag_dead namespace closer to W1227/W1229 naming convention. > Plus three CLAUDE.md doc-drift refreshes (W1247 module-local SARIF > convention + W1252 findings-registry decision + W1258+W1259 > 16 → 26 detector count + emit_finding(conn, record) API name) > and two new research memos (dev/PATTERN-2-EVOLUTION-2026-05-16.md > <!-- PHANTOM 2026-05-18: declared SHIPPED but memo absent from disk. Regenerate from BACKLOG/test-fixture breadcrumbs before next release. --> > already shipped in CONSOLIDATE-13; dev/DETECTOR-FP-RATE-BENCHMARKS-2026-05-16.md > <!-- PHANTOM 2026-05-18: declared SHIPPED but memo absent from disk. Regenerate from BACKLOG/test-fixture breadcrumbs before next release. --> > 773 LOC false-positive rate benchmarks across the 26 emitting > detectors).

Audits / verdicts — W1242-W1259 batch

In flight — W1242-W1259 batch (parallel dispatches not yet on disk)

Added — W1226-W1248 batch (post-CONSOLIDATE-12, 2026-05-16 /loop iteration N+13)

> ~13 completions since CONSOLIDATE-12 (Section 58). Four themes: > (1) SARIF SHIP family grew from 34 to 37 emitters in a single > post-CONSOLIDATE-12 window — W1226 cmd_flag_dead (35th, three closed- > enum rules under the flag-* namespace: flag-staleness / > flag-single-reference / flag-constant-default; staleness-banded > per-result level with a warning ceiling — heuristic detector, > never escalates to error), W1227 cmd_orphan_routes (36th, per-route > dead-endpoint projection; single closed-enum rule orphan-route with > confidence-banded per-result level: high + medium → warning, low → > note; warning ceiling — heuristic detector, never escalates to error; > the used bucket is filtered upstream so SARIF consumers never see > non-actionable rows), W1229 cmd_verify_imports (37th, first SHIP > emitter that escalates to error — two closed-enum rules: > invalid-import (warning) for unresolved with FTS5 fuzzy-match > candidates, hallucination-import (error) for unresolved with no > candidates; verify-imports is the canonical "hallucination firewall" > detector for LLM-era code and the only verify-imports rule that > escalates to error per the W1229 scope discipline). (2) Pattern-2 > variant-D family enablement (W1241) — landed the canonical > resolution_disclosure() helper at > src/roam/output/formatter.py:1263 + _RESOLUTION_KINDS frozen > closed-enum (symbol / file / fuzzy / unresolved) + drift-guard > test (tests/test_resolution_disclosure.py). Helper substrate now > live for the W1242/W1243/W1244 Wave-1 adoption sweep across > cmd_impact / cmd_preflight / cmd_diagnose (in flight at > consolidation time). (3) SKIP-disclosure propagation arc continued > through Wave 14_KNOWN_MISSING decremented 20 → 17 via the > three W1226/W1227/W1229 SHIP-promote pin-list removals (no new > docstring waves this batch — long-tail of the propagation arc). > (4) Drift-guard remediation pass — W1237 (cmd_risk edge-kind > vocabulary canonicalized to roam.db.edge_kinds) + W1238 > (catalog/detectors.py framework-detector plugin loop migrated from > bare-except to log.warning(...) + continue per W531 fail-loud > discipline; previously-grandfathered _PRE_W662_PENDING entries > dropped to zero in that file). Plus a 884-LOC research memo > (dev/PATTERN-2-EVOLUTION-2026-05-16.md) cataloguing the > <!-- PHANTOM 2026-05-18: declared SHIPPED but memo absent from disk. Regenerate from BACKLOG/test-fixture breadcrumbs before next release. --> > seven-variant Pattern-2 family taxonomy and seven open gaps.

<!-- PHANTOM 2026-05-18: declared SHIPPED but memo absent from disk. Regenerate from BACKLOG/test-fixture breadcrumbs before next release. -->

Audits / verdicts — W1226-W1248 batch

In flight — W1226-W1248 batch (parallel dispatches not yet on disk)

Added — W1207-W1224 batch (post-CONSOLIDATE-11, 2026-05-16 /loop iteration N+12)

> ~13 completions since CONSOLIDATE-11 (Section 57). Two milestones: > (1) SARIF SHIP family closed out the entire CONSOLIDATE-11 SHIP > candidate roster — 6 emitters landed (W1207 + W1209 + W1210 + W1211 + > W1213 + W1216), bringing the SHIP family from 28 to 34 emitters in a > single window. cmd_llm_smells (29th SHIP, 10 closed-enum rules > under the llm-smells/ namespace — first SHIP emitter with > double-digit closed-enum rule count + severity-banded level) + > cmd_fan (30th SHIP, per-symbol fan-in/fan-out projection) + > cmd_hotspots (31st SHIP, runtime-mode only; --security/--danger > sub-modes emit raw findings outside the closed-enum > hotspots/* rule catalogue per W1210's scope discipline) + > cmd_dark_matter (32nd SHIP, per-pair hidden-coupling projection; > single closed-enum rule dark-matter/hidden-coupling with > confidence-tier-banded severity) + cmd_duplicates (33rd SHIP, BAIL- > and-capture promotion landing — per-cluster semantic-duplicate > projection; single closed-enum rule duplicates/cluster with > similarity-banded severity) + cmd_laws (34th SHIP, per-rule > invariant projection from the W119 mined-laws substrate). All six > wrappers hash-stable additive; no persisted finding rows touched. > The 6-candidate SHIP roster from CONSOLIDATE-11 closed to zero > outstanding in this window. (2) Pattern-3b propagation arc — > 14+ waves shipped, _KNOWN_MISSING 64 → 20. W1224-impl landed > 37 SKIP-eligible docstrings across two waves (Wave 14a = 15 > docstrings; Wave 14b = 22 docstrings) — the largest single-wave > batch of the arc to date, and 0 BAILs across both sub-waves. > Sites: cut / dev_profile / doc_staleness / docs_coverage / drift / > effects / eval_retrieve / evidence_diff / evidence_doctor / fitness / > fn_coupling / graph_stats / idempotency / index / index_bundle (14a) + > ingest_trace / invariants / mutate / owner / pr_diff / pr_prep / > side_effects / split / stats / suggest_reviewers / surface / > syntax_check / telemetry / test_gaps / test_pyramid / tx_boundaries / > version / vuln_map / vuln_reach / workflow / xlang / index_stats (14b). > Pin-list now down to 20 surviving entries (cmd_debt + > cmd_entry_points + cmd_flag_dead + cmd_guard + cmd_map + > cmd_metrics + cmd_orphan_routes + cmd_path_coverage + > cmd_patterns + cmd_plan_refactor + cmd_pytest_fixtures + > cmd_risk + cmd_safe_delete + cmd_safe_zones + > cmd_simulate_departure + cmd_suggest_refactoring + > cmd_testmap + cmd_verify_imports + cmd_why_slow + cmd_ws).

Audits / verdicts — W1207-W1224 batch

In flight — W1207-W1224 batch (parallel dispatches not yet on disk)

Added — W1213-W1222 batch (post-CONSOLIDATE-10, 2026-05-16 /loop iteration N+11)

> ~10 completions since CONSOLIDATE-10 (Section 56). Two milestones: > (1) SARIF SHIP family grew from 24 to 28 emitters (W1208 + W1217 + > W1218 + W1219 + W1215). cmd_n1 (24th, W110 N+1 detector wrapper > with 3 closed-enum rules — high/med/low; 89+30 tests pass) + > cmd_missing_index (25th, 3 closed-enum rules; 20 tests pass) + > cmd_orphan_imports (26th, 3 closed-enum rules — > internal_typo=error / missing_package=warning / > missing_local=warning) + cmd_over_fetch (27th, single closed-enum > rule at warning; dual-shape endpoint+model handling) + > cmd_bus_factor (28th, 3 closed-enum rules — concentration / > stale-ownership / solo-summary; directory-anchor pattern; > hash-stable sha256 verified). (2) Pattern-3b propagation arc — > 12+ waves shipped, _KNOWN_MISSING 96 → 64. Wave 13 closed 10 > SKIP-eligible docstrings with zero BAILs (changelog / db_check / > intent_check / metrics_push / recommend / report / retrieve / > schema / search_semantic / simulate). W1212 reclassification + > W1220 SKIP + W1222 inline stale-pin removal close this batch's > propagation contribution. 6 SHIP candidates remain pending (W1207 > cmd_llm_smells / W1209 cmd_fan / W1210 cmd_hotspots / > W1211 cmd_dark_matter / W1213 cmd_duplicates / W1216 cmd_laws).

Audits / verdicts — W1213-W1222 batch

Added — W1199-W1212 batch (post-CONSOLIDATE-9, 2026-05-16 /loop iteration N+10)

> ~15 completions since CONSOLIDATE-9 (Section 55). Three milestones > + one reclassification discipline: (1) SARIF SHIP family grew to > 23-24 emitters (W1203 + W1208). cmd_test_impact (23rd SHIP, ~333 > LOC = 160 prod + 173 test) joined as a per-test reach_count ranker > with file-level anchor, reusing the global --sarif flag plumbed > through _SARIF_CONSUMERS; 11 new SARIF tests + 59 pre-existing > pass. cmd_n1 (24th SHIP) joined as a W110 N+1 detector SARIF > wrapper with per-query findings. (2) Pattern-3b propagation arc — > 11 waves shipped, 58% gap closed. _KNOWN_MISSING dropped 96 → 82 > across Wave 10 (W1205-impl, 10 Bucket B docstrings; 96 → 86) and > Wave 11 (W1206-impl-skip, 5 of 6 SKIP docstrings; 88 → 82). The > 6th SKIP docstring (cmd_duplicates) bailed mid-impl and was > captured as W1213 SHIP. 114 commands closed across W1180 → W1212 > (58% of the original 196-file gap). (3) Reclassification > discipline — W1212 + W1213. W1199 (CONSOLIDATE-9 SHIP candidate > for cmd_coverage_gaps) was REVISED to SKIP-DISCLOSURE this window > via W1206-audit-unclear's deeper audit (REPORT command — wrap_findings > is envelope-level, not per-location). Symmetrically, cmd_duplicates > was discovered as a SHIP candidate by W1206-impl-skip's premise check > and captured as W1213. The methodological move is deeper audit > beats initial classification.

Audits / verdicts — W1199-W1212 batch

Added — W1186-W1198 batch (post-CONSOLIDATE-8, 2026-05-16 /loop iteration N+9)

> ~25 completions since CONSOLIDATE-8 (Section 54). Three milestones: > (1) SARIF SHIP family at 22 emitters (W1192 + W1195). cmd_delete_check > (21st SHIP, ~165 LOC) joined as the first SHIP emitter with PRIMARY + > SECONDARY SARIF locations (deletion candidate is PRIMARY; surviving > refs in code/test/docs/config are SECONDARY) for the BREAK-RISK > gate-blocking pattern. cmd_auth_gaps (22nd SHIP, ~180 LOC) joined as > the first SHIP emitter with explicit 3-tier confidence in SARIF output > (static_analysis / structural / heuristic flow from > single-source-of-truth confidence map into properties.confidence). > Pre-batch the SHIP family was 20 (cmd_smells + cmd_clones + > cmd_partition + cmd_affected_tests + cmd_impact + cmd_critique + 14 > pre-existing). (2) Pattern-3b propagation arc — 9 waves shipped, > 51% gap closed. _KNOWN_MISSING dropped 196 → 96 across W1180 + > W1181 + W1182 + W1185 + W1187 + W1188 + W1189 + W1190 + W1191 + W1194 > + W1195 + W1197 + W1198 (100 commands closed; 51% of the original > 196-file gap). This batch added 4 more waves on top of Section 54's 3: > Wave 6 (W1189-impl, 10 commands; 137 → 127), Wave 7 (W1191-impl, 11 > commands + cmd_delete_check stale-pin removal drive-by; 125 → 114), > Wave 8 (W1194-impl, 10 Bucket B/C/E; 113 → 103), Wave 9 (W1197-impl, > 4 SKIP + 2 UNCLEAR-resolved-to-SKIP; 100 → 96). (3) Capture > discipline preserved — 6 SHIP candidates deferred cleanly > (W1199-W1204). W1198-audit identified cmd_coverage_gaps (W1199) > + cmd_orphan_routes (W1200) + cmd_pytest_fixtures (W1201) + > cmd_test_gaps (W1202) + cmd_test_impact (W1203) + > cmd_verify_imports (W1204); ~7-10d total effort. All 6 emit > per-location findings in JSON envelope today; remaining work is > emit_finding() integration + SARIF wrapper per W1192/W1195 scaffold. > Hash-stability invariant held across all 22 emitters. 131/131 SARIF > tests pass throughout.

Audits / verdicts — W1186-W1198 batch

Added — W1186-W1189 batch (post-CONSOLIDATE-7, 2026-05-16 /loop iteration N+8)

> ~10 completions since CONSOLIDATE-7 (Section 53). Three pillars: > (1) SARIF substrate adoption STRUCTURALLY COMPLETE — all 19 > *_to_sarif helpers across the codebase now use _rule_entry() + > _result_entry() factories (W1178 + W1179a + W1179b + W1186 polish). > Hash-stability cryptographically verified via sha256 matches on > pre/post adopter outputs. Net ~LOC-neutral overall (per W1080 > discipline) — the substrate's value is structural API consistency, > not raw LOC reduction. (2) Pattern-3b propagation arc — 5 waves > shipped: Wave 1 (10 bootstrap) + Wave 2 (10 local-state) + W1185 > outliers (2 commands) + Wave 3 (12 codegen) + Wave 4 (12 > exploration) + Wave 5 (11 continuation) = 56 commands closed. > _KNOWN_MISSING dropped 196 → 138 across this stretch (29% of the > original gap closed). Wave 6 audit landed via W1189-audit (10 > commands queued for next dispatch). (3) Concurrent-merge > discipline battle-tested — multiple "file modified since read" > guards fired across W1179a/b + W1180/W1181/W1185 races and > resolved cleanly via the Edit guard's read-before-write contract.

Audits / verdicts — W1186-W1189 batch

Added — W1177-W1185 batch (post-CONSOLIDATE-6, 2026-05-16 /loop iteration N+7)

> ~14 completions since CONSOLIDATE-6 (Section 52) including 8 SHIPPED > outcomes + 2 RESEARCH memos + 3 AUDIT-VERDICTs + 1 PARTIAL audit-extraction. > Three major systemic shifts landed: (1) the SARIF helper substrate > launched via W1178 — _rule_entry() + _result_entry() factories in > sarif.py reduce ~80 LOC of new substrate + ~50 LOC of subtractive > adoption across cmd_dead + cmd_critique + cmd_partition (3 adopters); > 17 more emitters being refactored in parallel (W1179a/b in flight); (2) the > Pattern-3b SARIF-disclosure propagation arc launched via W1175-RESEARCH — > 684-line memo planned 30-50 batches with asymmetric propagation (bulk for > ~135 likely-SKIP, 1:1 for ~14-20 likely-SHIP, ~17 unclear). Wave 1 (W1180, > 10 bootstrap commands) + Wave 2 (W1181-impl, 10 local-state commands) + > W1185 outliers (cmd_lsp + cmd_rules_validate) shipped — _KNOWN_MISSING > dropped from 196 to 174 (33 done, 162 to go); (3) vocabulary canonicalization > disciplines W1156 + W1162 + W1176 sealed with cmd_pr_analyze NO_CHANGES → > NOCHANGES sweep landing this batch and the REFERENCE_REMOVAL_VERDICTS > substrate fully operational. Hash-stability invariant held across all > shipped impls.

Audits / verdicts — W1177-W1185 batch

Added — W1158-W1176 batch (post-W1156-CONSOLIDATE, 2026-05-16 /loop iteration N+6)

> ~1000+ LOC of impl across 8 SHIPPED outcomes + 2 RESEARCH memos + > 3 AUDIT-VERDICTs. 13 completions since W1156-CONSOLIDATE. Three > structural inflections landed: (1) the SARIF SHIP family grew from > 17 commands (post-W1146) to 20 commands via cmd_impact + > cmd_affected_tests + cmd_partition (smells / clones SHIP impls > in flight as W1171 + W1172); (2) the W1169 SARIF-disclosure-coverage > CI lint discovered 196 unaudited cmd_*.py files, vastly exceeding > the W1166-RESEARCH 4-8 estimate, with a _KNOWN_MISSING frozenset > pinning the gap and W1175-RESEARCH planning the propagation strategy; > (3) two vocabulary canonicalization sweeps closed (W1162 likely-stale > + W1176 NO_CHANGES → NOCHANGES) extending the W1156 dual-form pattern. > action.yml allowlist intent documented (W1167) + cli.py ⊃ action.yml > subset CI lint pinned (W1168).

Audits / verdicts — W1158-W1176 batch

Added — W1149-W1156 batch (post-W1149-CONSOLIDATE, 2026-05-16 /loop iteration N+5)

> ~180 LOC of impl across 5 SHIPPED outcomes + 2 AUDIT-VERDICTs. > 7 completions since W1149-CONSOLIDATE. Two structural verdicts > landed: (1) the SARIF-disclosure pattern now spans 9 commands** > (W1144 + W1145 + W1148 + W1152 + W1154-impl x6), formally documenting > "invocation-scoped aggregates have no SARIF locations[]" as a stable > design rule; (2) reference-removal verdicts (cmd_refs_text + > cmd_delete_check) elevated to a closed-enum frozenset > (REFERENCE_REMOVAL_VERDICTS) via W1156 — drift guard pinned in > test_evidence_v0.py + dual-form normalization preserves CLI display. > publish.yml hardened with persist-credentials:false on build + > smoke checkout steps (W1103) + 3 dist/*.whl sites converted to a > robust single-wheel assertion with quoted variable (W1104).**

Audits / verdicts — W1149-W1156 batch

Added — W1136-W1149 batch (post-W1133-CONSOLIDATE, 2026-05-15 even more iteration)

> ~410 LOC of impl across 10 SHIPPED outcomes (W1100 +28 LOC + W1099-narrow > ~80 LOC + W1136 +339 LOC + W1141 +36 LOC + W1144 +6 LOC + W1145 +9 LOC + > W1148 +14 LOC) + 1 research memo (W1139-RESEARCH 361 LOC) + 3 SARIF > audit verdicts (W1085 + W1146 + W1147). 11 completions + 4 captures > since W1133-CONSOLIDATE. Three structural outcomes: (a) the W332 > Pattern-3b CLI-boundary thread is functionally closed at v13.x via > the W1141 4th-mirror drift guard; (b) the SARIF audience-disclosure > trilogy (W1144 + W1145 + W1148) propagated the deliberate-skip > rationale docstring across cmd_doctor + cmd_audit + cmd_pr_risk; > (c) the Pattern-3b CLI-arg lint matrix now spans 5 axes (W1111 + > 4 W1121 siblings) PLUS the option-dest extension (W1136 input_path > cluster). The user-facing CLI surface saw the biggest sweep in 30+ > sections: 14 commands gained metavar="SYMBOL" alignment (Strategy D — > no breaking rename) and 6 CLI-only commands gained --file → --path > harmonization with hidden alias backward-compat.

Audits / verdicts — W1136-W1149 batch

Added — W1097-W1133 batch (post-W1126-CONSOLIDATE, 2026-05-15 even further iteration)

> ~80 LOC of impl + 253 LOC of new test coverage across 7 SHIPPED > outcomes + 6 audits classified into SYMBOL-CONCEPT or DOMAIN-DISTINCT > verdicts + 1 BAIL (W1101 — premise inverted, captured W1126 inverted > task). 14 completions since W1126-CONSOLIDATE. Two structural > closures: (a) the cmd_runs placeholder-vocabulary cluster > (W1097+W1105+W1116+W1125 = 8 sites swept end-to-end) and (b) the > W1118-bundle reclassification (12 W1111 grandfathered sites > classified into 10 SYMBOL-CONCEPT + 2 DOMAIN-DISTINCT permanent > carve-outs). The v14.0 hard-rename candidate cluster now spans > ~21 files (W1133) — significantly bigger than the W1004 audit's > original 6-file estimate; this scope expansion feeds the W1098 > USER-DECISION at v14.0 planning.

Audits / verdicts — W1097-W1133 batch

Added — W1086-W1126 batch (post-W1096-CONSOLIDATE, 2026-05-15 even later iteration)

> ~250 LOC of impl (W1060-take2 + W1086 dominant) + ~120 LOC of new > test coverage + 1 research memo (677 LOC) across 8 shipped outcomes + > 1 BAIL (W1101 premise-inverted). 9 of 9 dispatches closed; 17 new > drive-by W-tasks captured (W1112-W1128). The architectural ship — > to_sarif gained a warnings_out parameter + new closed-enum > descriptor producer.advisory-warning — unlocks 4 sibling SARIF > helpers (W1112-W1115). W1102-RESEARCH closed the W1098 USER-DECISION > as "no v14.0 rename needed; ship the W1111 AST CI lint instead". > Premise-verification-first discipline continues to outperform > force-through (W1101 BAIL: the W1004 audit had misread the dominant > convention, so the proposed sweep would have been backwards).

Audits / BAILs — W1086-W1126 batch

Added — W1041-W1096 batch (post-W1079-CONSOLIDATE, 2026-05-15 later iteration)

> ~63 LOC + ~39 test LOC across 3 shipped impl waves (W1087/W1091/W1096) + > 8 BAIL/NO-OP/VALIDATED outcomes (W1041/W1004/W1005/W1007/W1008/W1020/W1048/W1060-narrowed) > + 1 helper-confirmed (W1041) — the iteration's load-bearing methodological > output is that BAIL-and-CAPTURE is faster and more accurate than force-through. > ~17 waves across the W1041-W1096 stretch; 4 of 11 dispatches landed > BAIL/NO-OP (W1041 already alphabetical, W1008 already converged via W706+W1057, > W1005 already W547/W564-compliant, W1020 already optimised with scope="module" > overrides where viable); 1 BAIL with prereq capture (W1060-narrowed: warnings_out > accumulator absent in cmd_complexity, so the proposed runtime-notifications > plumb would have been cargo-cult — captured W1084/W1085/W1086 as prereqs > instead); 2 VALIDATED-then-fixed (W1007 → W1091 next_commands LAW 4 fix; > W1008 → drive-by W1093 captured). The bail discipline (W1019b/W1019e/W1080 > precedent + W988+W989 "premise verification is the first step" methodology > from W1001-CONSOLIDATE) generated 9 follow-up W-tasks (W1084-W1097) instead > of forcing-through cargo-cult code. All 11 dispatches used general-purpose > or Explore subagents per the W1072 directive — claude subagent > worktree-MAX_PATH blocker still active on Windows.

Audits / NO-OPs — W1041-W1096 batch

Fixed — CI publish.yml (post-v13.1)

W1079-CONSOLIDATE — Pattern-1D + closest-match disclosure arc + helper-hoist Phase 1 + Pattern-2 propagation finale

> CONSOLIDATE checkpoint = W1079. ~17 waves closed since W1042-CONSOLIDATE. > Headline: the Pattern-1D unknown-value disclosure arc went from 1 > command (W1063 cmd_findings --detector) to 9 commands in a single > batch — cmd_findings / cmd_search --kind (W1068) / cmd_endpoints > --framework (W1069) / cmd_endpoints --method (W1075) / > cmd_test_scaffold --framework (W1070) / cmd_workflow + cmd_explain_command > (W1074) / cmd_oracle (W1079) / cmd_smells (W1066) — each emitting an > explicit structured envelope on the unknown-value path plus a > difflib.get_close_matches-derived "did you mean?" hint when the typo > distance is plausible. Pattern shape: closed enum → reject unknown > with state="unknown_<axis>" + partial_success=true + agent_contract.facts > listing the valid set + next_command carrying the closest match. > Mirrors the W918 / W994 / W995 / W1009 / W1011 / W1032 / W1042 loader > envelope shape — Pattern-1D ("silent success on degraded resolution" > from CLAUDE.md §"Six systemic anti-patterns") is now the 9th member > of the Pattern-2 propagation family. W1078 added a deliberate > click.Choice carve-out (cmd_complete --kind is click-validated, so > the Pattern-1D template does not apply; closed not-applicable). > Helper hoist (W1077) shipped src/roam/output/structured_unknowns.py::structured_unknown_filter > as Phase 1 (UNUSED on landing — 128 LOC + 15 tests; Phase 2 migration > W1080 in flight at consolidation time). Pattern-2 propagation > closures — W1010 final cmd_flag_dead._load_known_stale plumbing-only > close (plain-text loader, not YAML — does not flow through > load_yaml_with_warnings); W1043 WarningsOut type alias swept > 21 callsites across 8 files in one consistent application. Operational > findings: W1067 permit-expiry investigation closed NOT-A-BUG > (audit-completeness design per W377); W1071 documented permit-vs-lease > asymmetry in module docstrings + CLAUDE.md; W1072 — claude > subagent is structurally broken on Windows host via the W686 > worktree-MAX_PATH regression (the agent platform's default-worktree > behavior is the structural issue; the general-purpose subagent > works around it by not creating a worktree). W1076 documented that > CLAUDE.md is intentionally untracked (commit 89a338d9 removed it > from public repo — local-only by design). Test discipline: W1027 > extracted the _no_pyyaml monkeypatch to a tests/conftest.py > fixture (6 test files migrated, -50 LOC); W1059 converted 10 > hardcoded expires_at future-dates to relative offsets across 2 > files; W1065 triaged 3 more files with 0 conversions (all valid > B-variant expires_at-as-input fixtures). Research: W1049-RESEARCH > shipped a release-pipeline hardening memo with 3 P1 recommendations > (PEP740 attestations + workflow split + SBOM-wheel SHA binding) — > queued as W1054 / W1055 / W1056 user-decision-pending. Hash-stability > mandate held trivially across the batch — every Pattern-1D fix > added a new validation path (no pre-fix envelope bytes to compare); > W1077 helper shipped unused (no callsites yet); W1010 / W1043 / W1027 > / W1059 are docs/types/test-fixture only with no runtime behavior > delta. NO commits taken during the session per directive — entire > batch on the working tree for review.

Added — Pattern-1D + closest-match disclosure arc (W1066 / W1068 / W1069 / W1070 / W1074 / W1075 / W1078 / W1079 — W1079-CONSOLIDATE)

Added — Helper hoist Phase 1 (W1077 — W1079-CONSOLIDATE)

Pattern-2 propagation closures (W1010 / W1043 — W1079-CONSOLIDATE)

Test discipline (W1027 / W1059 / W1065 — W1079-CONSOLIDATE)

Operational findings (W1067 / W1071 / W1072 / W1076 — W1079-CONSOLIDATE)

Research memos (W1049-RESEARCH — W1079-CONSOLIDATE)

v13.1 (released 2026-05-15) -- Pattern-2 propagation + shared YAML helper + 3 flagship silent-fallback seals

> THREE flagship Pattern 2 silent-fallback bugs SEALED this batch (W826 cmd_taint + W834 cmd_health + W836 cmd_doctor) + W817 helper-level auto-inject closed Pattern 2 partial_success gap across 7 detectors in one shot (dead / clones / complexity / orphan-imports / bus-factor / auth-gaps / hotspots) + W810 cmd_complexity Pattern 1B fix (SystemExit(1) → return on empty corpus) + W805 empty-corpus sweep covered 25+ detectors (cmd_endpoints/n1/missing-index/over-fetch/smells/duplicates/invariants/vulns/audit-trail-conformance/audit-trail-verify/pr-risk/critique already clean; 7 auto-fixed by W817; 3 flagship dedicated fixes) + W749 dispatch-edge MIN(id) fix in registry_dispatch (231 + 34 + 22 edge attribution corrections) + W774 sister fix to laravel_post.py (worktree-pending) + W718 cleaned 70+ UPPER-case severity sites + W634 confidence_level_rank fail-loud + 15 callers + W444 mcp_tool_names duplicate fail-loud + W445 _REGISTERED_TOOLS append guard + W707 _serialize_suppressions dead-code seal + drift-guard expansion (W703 _CommentSyntax + W741 find_project_root symlink-safety + W484 templates/ci/ reachability + W711/W712 mcp --card/--list-tools coverage + W713 _SARIF_CONSUMERS AST-literal + W757 backfilled missing W702) + W397 build_readme_counts AGENTS.md + W734 CONTRIBUTING.md count refresh + five research memos shipped in dev/ (MCP-EVOLUTION / MCP-SERVER-CARD / MCP-TASKS-EVAL / MCP-ELICITATION-CANDIDATES / DETECTOR-FP-METHODOLOGY) (~50-completion batch behind W836-CONSOLIDATE, 2026-05-15). > The headline is THREE flagship Pattern 2 silent-fallback bugs sealed in main this batch — all three share the "claim success on unanalyzed corpus" shape that the W805 empty-corpus sweep was designed to surface. Mirroring the proven cmd_vulns Fix E template (state="no_scan" + partial_success=True + actionable verdict) and the cmd_missing_index state="no_migrations" discipline, each fix detects the empty-graph precondition BEFORE running the rule/check pipeline and emits an explicit Pattern-2 envelope instead of a default-success illusion. W826 cmd_taint — previously emitted "No taint findings across 22 rule(s)" + partial_success=false (W817 auto-injected, making the false claim deterministic) on a fully-empty corpus; the verdict read as a clean security pass on an unanalyzed repo. Fix: 52-line empty-corpus guard right after open_db(...); on COUNT() FROM symbols == 0 emits state="empty_corpus", partial_success=true, rules (count loaded but not run), verdict "no symbols to analyze (corpus empty; N rules loaded but not run — run \roam index --force\ to populate the graph)". Test tests/test_w825_taint_empty_corpus.py xfail-strict flipped green plain; 35/35 taint regression tests pass. W834 cmd_health — the FLAGSHIP CI-gate command previously emitted verdict: "Healthy codebase (100/100) — 0 critical issues" with health_score: 100 on an empty corpus, because every health factor defaulted to 1.0 on zero signal and the geometric mean returned exactly 100 → score ≥ 80 threshold matched "Healthy codebase". A 100/100 Healthy verdict on an unanalyzed repo was a HIGH-severity false claim that would silently pass CI gates. Fix: 65-line empty-corpus carve-out before build_symbol_graph(...); emits state="empty_corpus", partial_success=true, health_score=None (not 0, not 100), verdict "no symbols to analyze" + next_command="roam index --force". --gate flag raises GateFailureError (exit 5) on empty corpus — mirrors W829 audit-trail-verify discipline (fail-closed on missing analysis). Test 1/1 pass; 93/93 health regression tests pass; LAW 4 lint 8/8 pass. W836 cmd_doctor — previously checked only environment markers (Python/tree-sitter/git/networkx/manifest) and never asked "did the indexer extract anything?". On a clean env + empty corpus, would emit "all N checks passed" even though zero symbols had been indexed. Fix: new _check_corpus_content() function queries SELECT COUNT() FROM symbols; states no_index / empty (advisory fail with actionable verdict) / populated / error. Wired into the check pipeline after _check_required_tables and into _ADVISORY_CHECK_NAMES so empty corpus warns but does not block CI by default. Total check count bumped 23 → 24; 71/71 doctor regression tests pass; 5/5 W835 tests pass. W817 helper-level auto-inject closes Pattern 2 across 7 detectors in one shot: added a 9-line auto-inject at src/roam/output/formatter.py:json_envelope() defaulting summary.partial_success to False when missing. Closed the gap for 7 detectors without per-command edits; companion W819 manual xfail-strict flip on the 7 corresponding empty-corpus smoke tests; W810 manual cmd_complexity Pattern 1B fix (SystemExit(1) → return). W749 dispatch edge-attribution chain extended: discovered dispatch edges were 100% mis-attributed via a DIFFERENT mechanism than W742 — MIN(id) synthetic-source in registry_dispatch.py. Replaced with per-file [(line_start, line_end, symbol_id)] map + _symbol_for_assignment lexical-extent lookup. _COMMANDS now sources 231 dispatch edges (was attributed to _DEPRECATED_COMMANDS); _MATH_DETECTORS 34 edges (was attributed to log); PYTHON_IDIOM_DETECTORS 22 edges (was attributed to _MUTABLE_DEFAULT_RE). W805 empty-corpus sweep methodology validated: 25+ detector commands smoke-tested; 12 already Pattern-2 clean; 7 auto-fixed by W817; 3 flagship dedicated fixes; every smoke test ships with a forbidden-fragment blacklist ("safe" / "healthy" / "no concerns" / "all clear" / "100/100") as a regression guard. W772 worktree-staleness operational finding: ~8 dispatches bailed because agent worktrees branch from commit 850552af (pre-session main); user's git config --global core.longpaths true fix resolved the parallel W686 path-length issue. Hash-stability mandate held across every fixtests/test_evidence_schema_migration.py 31/31 byte-identical.

Added — Detector inventory memo (W850)

Fixed — THREE flagship Pattern 2 silent-fallback bugs SEALED (W826 taint + W834 health + W836 doctor)

Fixed — Edge-attribution chain extended (W749 + W774)

Fixed — W805 empty-corpus sweep: methodology + 25+ detectors smoke-tested

Fixed — Defensive fail-loud expansion (W444 / W445 / W634 / W707)

Changed — UPPER-case severity vocabulary canonicalisation (W632 / W718)

Added — Coverage tests + drift guards (W397 / W444 / W484 / W680 / W681 / W703 / W711 / W712 / W713 / W734 / W741 / W757 / W801–W835)

Research memos shipped (5 in dev/)

Operational findings (pending fix / decision)

Added — Smell catalog detector roster 20 → 24 (W852 / W853 / W855 / W856 / W857 — W865-CONSOLIDATE)

Added — Strategy memos shipped earlier in the W836→W865 arc (W848 / W849 / W850 / W859)

Operational findings (W865 batch — pending fix / decision)

Added — Catalog helper-hoist arc + registry-parity backstops (W886-CONSOLIDATE)

Operational findings (W886 batch — pending fix / decision)

Added — Cross-layer hoist execution + decorator-driven registry POC (W908-CONSOLIDATE)

Added — Decorator-driven smell-detector registry POC (W871)

Fixed — Cross-language test-path false-positive (W889)

Hardened — Stale-pending triage + drive-by audits (W902)

Operational findings (W908 batch — pending fix / decision)

Fixed — Registry-parity remediation HIGH-RISK trio (W910 / W911 / W912 + W913 — W922-CONSOLIDATE)

Changed — W877/W878/W879/W880 hoist arc carry-through + W894 confidence-tier mismatch sealed (W922-CONSOLIDATE)

Hardened — W902 cargo-cult follow-through + CLAUDE.md anti-pattern rule (W904 / W905 / W907 — W922-CONSOLIDATE)

Hardened — W914 second stale-pending re-triage (W922-CONSOLIDATE)

Operational findings (W922 batch — pending fix / decision)

Added — Canonical-source consolidation arc (W923 / W925 / W929 / W935 + W866 / W920 / W927 / W928 — W939-CONSOLIDATE)

Hardened — Behavioral-fingerprint twin sweep + cycle-verification discipline (W920 / W927 / W928 — W939-CONSOLIDATE)

Hardened — W930 declassification + W907 carry-through (W939-CONSOLIDATE)

Operational findings (W939 batch — pending fix / decision)

W949-CONSOLIDATE — GATE 1 of the registry-parity milestone CLOSED (W940 → W941 + W871-bulk + W895 / W896 / W897 + W870 + W914-pass-3 + W938)

> MILESTONE: this is NOT just another batch. The W869 research memo > catalogued the registry-parity bug class as having 10 instances across > roam-code; Instance #1 (the smell-detector P0 surface) is now > structurally CLOSED. The pre-state had TWO hand-rolled parallel data > tables in lockstep — ALL_DETECTORS = [...] in src/roam/catalog/smells.py > + _SMELL_KIND_TO_CONFIDENCE = {...} in src/roam/commands/cmd_smells.py > — held together by W862 + W867 drift-guard lints (catching the symptom) > after every detector wave. W941 converted BOTH tables to DERIVED VIEWS > off the @detector-decorated registry: every detector self-registers > with its smell_id + confidence tier; the parallel-data shape is now > structurally impossible for this registry. ~78 lines of hand-rolled > parallel data eliminated. 24 detectors + 1 rollup = 25 confidence > entries, all canonically registered, all derivable. Hash-stability > mandate held (detector output bytes unchanged). 283/283 focused tests > pass. Gate 1 of W940's milestone framing is CLOSED; the > registry-parity bug class is no longer maintainable as parallel-data > debt for this surface. The remaining 9 instances (MCP tool registry, > mode-allowlists, _DEPRECATED_COMMANDS, subject_kind, etc.) are > sequenced in W869 + W940 follow-up memos.

Closed — GATE 1 registry-parity milestone (W940-RESEARCH → W871-bulk → W941 — W949-CONSOLIDATE)

Closed — Hash-stable canonical-source fold (W938 — W949-CONSOLIDATE)

Closed — Third stale-pending triage (W914-pass-3 — W949-CONSOLIDATE)

Operational findings (W949 batch — pending fix / decision)

W965-CONSOLIDATE — Gate-1 cleanup + W525 strategic pause + W918 / W924 / W933 typing/silent-fallback close

> CONSOLIDATE checkpoint = W965. ~10 closures + 15 drive-by captures > since W949-CONSOLIDATE. Two arcs landed in parallel: (1) the W942 / W945 > / W946 / W947 / W955 / W956 follow-throughs that close the W941 Gate-1 > cleanup queue (count-drift lint pivoted to registry source; pre-W941 > "two SOURCE-OF-TRUTH" wording flipped to past-tense across registry.py > + smells.py + the parity test; freeze_registry invariant numbering > re-ordered to match execution order), and (2) three independent > source-tightening landings — W918 closing a Pattern 2 > silent-fallback hole in _resolved_thresholds (unknown user-supplied > metric now surfaces via warnings_out + envelope partial_success=True > + a new agent_contract.facts entry), W924 stamping > detector_version on every detectors._finding envelope via the > pre-existing canonical roam.catalog.versions.detector_version(task_id) > (most task_ids → DEFAULT_VERSION='1.0.0'; the nested-lookup site > carries the 1.1.0 override), and W933 tightening > cmd_alerts._parse_alerts_yaml to dict[str, dict[str, Any]] + > selecting Option B (loose-but-honest typing) on _resolved_thresholds > because slot.update(rule) precludes TypedDict without runtime > validation. W525 — STOP AT INVENTORY. The W869 Instance #2 proving > ground (MCP tool registry) ran the inventory pass and surfaced real > structural gaps the wave's prescribed derivation would have silently > papered over: the hand-rolled _CORE_TOOLS (57 tools) does NOT match > @roam_capability(category="core") (0 — the category doesn't exist on > the decorator), and mcp_preset=("core",) is mostly boilerplate (228 > of 230 tools carry it). The decision was made to STOP at inventory > rather than mechanically derive until the strategic call on > derivation source (category= vs mcp_preset= vs hand-rolled) lands as > W357 long-horizon work. W525 split into 5 deep drive-bys > (W950 strategic / W951-W953 evidence / W954 closed) feeding that call. > W954 regression-guard test landedtests/test_w954_core_tools_capability_drift.py > (3 tests, 191 lines, all pass) snapshots _CORE_TOOLS=57, capability > registry=230 (one retired), mcp_preset="core" boilerplate=228, > category="core"=0, and floors at ~10% headroom (≥18 in_core_not_cap, > ≥180 in_cap_not_core). Hash-stability mandate held trivially across > the batch — W924's stamp lands on the dict AFTER make_finding_id > hashes only *raw_parts. Stale-pending triage closed three more rows > via the W914 methodology (W221 / W354 / W367 confirmed-closed from the > W949 batch; carried into the W965 BACKLOG strike-throughs for > visibility).

Closed — Gate-1 cleanup follow-throughs (W942 / W945 / W946 / W947 / W955 / W956 — W965-CONSOLIDATE)

Closed — Source-tightening trio (W918 / W924 / W933 — W965-CONSOLIDATE)

Captured — W525 STOP-AT-INVENTORY decision + W954 regression-guard (W950 / W951 / W952 / W953 / W954 — W965-CONSOLIDATE)

Closed — Stale-pending triage (W221 / W354 / W367 — W965-CONSOLIDATE carry-from-W949)

Operational findings (W965 batch — pending fix / decision)

W977-CONSOLIDATE — cmd_alerts Pattern-2 family FULLY CLOSED + W923 test-layer consolidation + W966 audit pass

> CONSOLIDATE checkpoint = W977. ~10 closures + drive-by captures > since W965-CONSOLIDATE. Headline: cmd_alerts.py Pattern-2 family is > now FULLY CLOSED end-to-end via the W962 / W963 / W964 trifecta > (op-vocabulary validation at parse + check time; bool-coercion fix on > delta_alerts) followed by the W967 / W968 / W969 trifecta (REAL > BUG: tiny YAML parser silently disabled delta_alerts for users > without PyYAML — a scalar-vs-section detection gap; REAL BUG: > level: "fatal" would KeyError downstream — _CANONICAL_LEVELS > frozenset + _coerce_level helper at 3 sites + counts initializer > fold; drift-guard test pins _VALID_OPS == AlertThreshold.op Literal > via typing.get_type_hints). 87 focused tests pass. Two bugs > were latent (0 fixtures exercised them). This is the SECOND > consecutive Pattern-2 family fully closed in this session — the > first was W826 / W834 / W836 in early-session (silent SAFE on empty > corpus across taint / health / doctor). cmd_alerts.py now > exemplifies the W918 discipline: every silent-fallback path surfaces > via warnings_out + partial_success=true + > agent_contract.facts. W923 test-layer consolidation — > W934 delegated 24 test_<detector>_findings_visible_via_cmd_findings_count > tests to tests/_findings_helpers.py via Strategy C (shared helper, > per-detector tests retained for fixture independence); doctor's > exact-count + critique's tolerant exit-code preserved; 24/24 + > ~190 sibling tests pass; net -46 lines (-114 of actual code). > W966 audit pass — W971 confirmed the codebase was already > W966-compliant: 13 HONEST sites, 0 LYING, 2 VALIDATED. The > "don't TypedDict a boundary you don't validate" discipline existed > before W966 codified it; W933 _resolved_thresholds is the > EXEMPLAR. W975 / W976 added lock-comments at json_envelope + > _compat_profile_payload per W971's recommendations, documenting > the W966 discipline at the call site. Hash-stability mandate held > trivially across the batch.

Closed — cmd_alerts.py Pattern-2 family FULLY CLOSED (W962 / W963 / W964 + W967 / W968 / W969 — W977-CONSOLIDATE)

Closed — W923 test-layer consolidation (W934 — W977-CONSOLIDATE)

Closed — Typing audit + lock-comments (W958 / W961 / W966 / W971 / W975 / W976 — W977-CONSOLIDATE)

Operational findings (W977 batch — pending fix / decision)

W1001-CONSOLIDATE — Pattern-2 playbook propagation across 3 candidate modules + SQL ESCAPE discipline + smells_suppress YAML hardening

> CONSOLIDATE checkpoint = W1001. ~15 closures + drive-by captures > since W977-CONSOLIDATE. Headline: the W983-RESEARCH playbook > (synthesised from W977's full cmd_alerts.py Pattern-2 close) was > propagated to the three nominated candidate modules — and the > outcomes are LOAD-BEARING. W987 sealed cmd_smells.py via a > full playbook apply (closed-set --kind validation against > kind_to_confidence() + warnings_out plumbed from the suppression > loader through the CLI to the envelope). W988 was CORRECTLY CLOSED > AS NOT-APPLICABLE — the agent verified W983's premise didn't match > cmd_conventions.py (no user-supplied boundary exists) and STOPPED > instead of fabricating work. W989 sealed cmd_pr_risk.py via a > DIFFERENT real Pattern-2 gap than W983's framing assumed — > _normalise_pr_risk_level was silently flooring unknown input to > "low" per the W718 CI-safety contract; now warns + preserves the > floor; NO TypedDict added per W966 discipline (internal dict, not > user boundary). The methodology lesson: premise verification is > the FIRST step of every playbook application. A playbook that > applies mechanically without checking the premise produces fabricated > work; a playbook that applies with discipline either seals the > nominated gap, finds a different real gap, or stops cleanly. > W990 → W991 SQL ESCAPE sweep: found 10 accidental wildcard sites > + 2 already-correctly-escaped in src/roam/catalog/detectors.py; 3 > were HIGH-risk in the idiom matchers. W991 fixed 8 W990 sites + 6 > parallel-pattern drive-bys + 1 duplicate matmul fallback = 15 LIKE > escapes total; 109 focused tests pass; smoke confirms > finXinXsortedXarray is correctly excluded. W994 + W995 > smells_suppress YAML hardening: W994 found a REAL BUG — > _is_expired was silently defaulting to not_expired on > unparseable expires strings (typo 2026-13-99 → suppression > stayed active). Fix at load-time AND match-time with a new > EXPIRES_FMT constant; 8 tests. W995 surfaced malformed-entry drops > that were previously "silently skipped" per an admitted comment in > the parser — now partitioned into valid/dropped + indexed warnings > + rollup; 7 tests. W982 cmd_fan rename completed: > fan_symbol → fan-symbol (9 cmd_fan.py + ~32 test sites; SQL LIKE > 'fan_%' fixed; the Strategy A persisted-hash break is documented). > 27 focused tests pass. W978 bus_factor stale-kind test fixed via > a fixture monkeypatch — the W405 shallow-history drop of a 2-year-old > commit had made the test brittle; 18/18 pass; 3 sharp drive-bys > captured (W984 autouse conftest / W985 INFO log on the drop / W986 > "first hypothesis" checklist in CLAUDE.md). W983-RESEARCH memo > shipped (dev/CMD-ALERTS-PATTERN-2-CASE-STUDY-2026-05-15.md, > <!-- PHANTOM 2026-05-18: declared SHIPPED but memo absent from disk. Regenerate from BACKLOG/test-fixture breadcrumbs before next release. --> > 374 lines, 7 reusable patterns, 3 candidate modules) and W999 > amended it with the W988/W989 outcomes — the case-study now > explicitly codifies "premise verification is the first step of > every playbook application". W970 CLAUDE.md drive-by: > _DEFAULT_THRESHOLDS is the canonical positive counter-example > for "when TypedDict IS appropriate" — a 7-line paragraph added to > the W966 sub-section pairs the rule with its exemplar. W936 > migrated 37 query_cost string literals to QUERY_COST_* > constants in detectors.py (W939-carry-forward closed). Hash- > stability mandate held trivially across the batch — every fix > either added new validation paths (no pre-fix envelope to compare), > swept LIKE-clause inputs (no detector output bytes moved), or > tightened YAML load-time validation (no persisted finding rows > touched).

Closed — Pattern-2 playbook propagation across 3 candidate modules (W987 / W988 / W989 — W1001-CONSOLIDATE)

Closed — SQL ESCAPE discipline (W990 / W991 — W1001-CONSOLIDATE)

Closed — smells_suppress YAML hardening (W994 / W995 — W1001-CONSOLIDATE)

Closed — Source carry-forwards + drive-bys (W936 / W970 / W978 / W982 — W1001-CONSOLIDATE)

Research + memos (W983-RESEARCH / W999 — W1001-CONSOLIDATE)

Operational findings (W1001 batch — pending fix / decision)

W1015-CONSOLIDATE — Pattern-2 propagation arc continuing + disclosure-hygiene class identified + YAML loader hardening memo

> CONSOLIDATE checkpoint = W1015. ~11 waves closed since W1001-CONSOLIDATE. > Headline: Pattern-2 propagation continues with W706 (cmd_ignore_findings) > + W1009 (per-finding-suppressions audit) + W1011 (cmd_alerts section-level > audit confirmation), bringing the running Pattern-2 closure count to 3 > more loader surfaces sealed this batch; a new disclosure-hygiene class > identified — W1000 sealed the strip_list_payloads warnings_out drop > via a new _ALWAYS_PRESERVED_LIST_FIELDS allow-set, defeating the previous > half-fix where partial_success=true survived but the structured warnings > array was stripped; a shared YAML loader hardening memo shipped (W1016- > RESEARCH) recommending a roll-our-own 2-phase migration (~125 LOC net > removed at 5 of 7 callsites); W996 docs the click-vocab divergence > across 7 commands as Pattern-3b parameter-name canonicalisation gap; > W1002 + W1003 fix test discipline — relative test-date offsets + > xfail-strict pin comment that survives autouse-fixture interactions; > W1015 lands the catalog _shared.py test coverage (24 tests + new > tests/test_catalog_shared.py). W886 / W890 verified already-guarded > via the W873 canonical (is_test_path None-guard is present at the > canonical site — no work needed). W494 verified clean — > test_inter_unused_return order-sensitivity audit found taint > already deterministic. Hash-stability mandate held trivially across the > batch — every fix either added new validation paths (no pre-fix envelope > bytes), preserved through-flow of an existing field (no detector output > bytes moved), or landed in test infrastructure (no source bytes).

Closed — Pattern-2 propagation continuing (W706 / W1009 / W1011 — W1015-CONSOLIDATE)

<!-- PHANTOM 2026-05-18: declared SHIPPED but memo absent from disk. Regenerate from BACKLOG/test-fixture breadcrumbs before next release. -->

Added — Disclosure-hygiene class identified (W1000 / W996 — W1015-CONSOLIDATE)

Closed — Test discipline improvements (W1002 / W1003 / W494 — W1015-CONSOLIDATE)

Closed — Not-applicable (W886 / W890 — W1015-CONSOLIDATE)

Test infrastructure (W1015 — W1015-CONSOLIDATE)

Research memos (W1016-RESEARCH — W1015-CONSOLIDATE)

Operational findings (W1015 batch — pending fix / decision)

W1042-CONSOLIDATE — Shared YAML helper arc lands + Pattern-2 propagation continues + cargo-cult or "" family substantially clean

> CONSOLIDATE checkpoint = W1042. ~18 waves closed since W1015-CONSOLIDATE. > Headline: the W1016-RESEARCH shared YAML helper landed (W1018) and > migrated through 4 of 5 planned Phase 2 callsites (W1019a/c/d/e; W1019b > in flight), each migration revealing a real helper-contract gap that was > sealed as a discrete follow-up wave — W1035 (JSON parse-error wording), > W1040 (PyYAML strict-timestamp force_tiny_parser kwarg), W1031 (typed > overload), W1043 (WarningsOut type alias). This is the BAIL-AND-SEAL > discipline working as designed — the migration agent bailed cleanly the > moment the helper's contract was insufficient, the gap was sealed as > a separate W, then the migration re-dispatched clean. Pattern-2 > propagation continued with 4 more loader surfaces Pattern-2-fied: > W1017 (load_per_finding_suppressions_typed warnings_out plumb), > W1025 (cmd_alerts thresholds-section sibling), W1032 > (load_suppressions + load_suppressions_typed deeper close + helper > migration), W1042 (sarif._load_suppressions_typed warnings_out > plumb). Running Pattern-2 loader-site total is now ~33-34 sites > sealed end-to-end; W1039-RESEARCH evaluated whether to evolve the > envelope shape for Python 3.13+ and concluded STAY — the current > playbook is the right one. Cargo-cult or "" cleanup swept 11 > sites across the W1029 batch (10 sites + 3 helpers None-guarded) + > W1034 (1 more in causal_graph.py); combined with W1013/W1014 the > family stands at 14 cargo-cult removals across 9 files and W907 > false-hedge anti-pattern enforcement is now mostly clean. **catalog/* > __all__ discipline landed via W1033 (_shared.py) + W1037 (5 > sibling catalog modules) for a total of 6 catalog modules with > explicit __all__ declarations, deterring the same cross-module > private-name-import pattern W901 originally captured. W1026 back-fill > retroactively annotated the W1016-RESEARCH memo with the W1018 tiebreaks > observed during Phase 1 implementation. Hash-stability mandate held > trivially across the batch** — every fix either added new validation > paths (no pre-fix envelope bytes), tightened type surfaces (no runtime > behavior delta), or hoisted helper internals (no detector output bytes > moved).

Added — Shared YAML helper arc (W1018 / W1019a/c/d/e / W1031 / W1035 / W1040 / W1043 — W1042-CONSOLIDATE)

Pattern-2 propagation (W1017 / W1025 / W1032 / W1042 — W1042-CONSOLIDATE)

Changed — Cargo-cult or "" cleanup (W1029 / W1034 — W1042-CONSOLIDATE)

Added — catalog/* __all__ discipline (W1033 / W1037 — W1042-CONSOLIDATE)

Research memos (W1039-RESEARCH — W1042-CONSOLIDATE)

Memo back-fills (W1026 — W1042-CONSOLIDATE)

Operational findings (W1042 batch — pending fix / decision)

---

> TWO CRITICAL edge-attribution fixes seal the call/import edge family end-to-end (W708 + W742) + suppression family migration extended through Phase C-1 (W722 → W723 → W736 → W737 → W738) + Pattern-3a canonical-rank consolidation third axis CLOSED (W596 confidence + W631 risk + W640 alerts level_order fold + W648 zero-slipped audit + W649 alerts lowercase) + bare-except discipline structurally CLOSED (allowlist 9 → 3 via W660 / W665 / W677 / W678 / W679 / W740 + W662 / W746 _GUARDED_DIRS 4 → 12 + W707 dead-code REAL BUG seal) + wheel-bundling thread CLOSED (W664 LIVE BUG caught + W668 as_file audit + W624 importlib.resources migration + W642 / W643 fallback removals) + MCP wrapper P0 batch (W670 alias regression + W671 cold-start guard + W672 surface sync + W606 collision lint + W607 / W636 wrapper refactor + W695 --card smoke) + smell catalog reached 20 detectors (W601-W605 / W639 / W646 / W647 / W650 / W705 / W720) + hygiene drive-bys (W682 / W683 / W685 / W689 / W690 / W697 / W699 / W702) (~20-completion batch behind W755-CONSOLIDATE, 2026-05-15). > The headline is TWO systemic edge-attribution correctness fixes that together seal the family. W708 (call-edge mis-attribution): _store_symbols + _merge_existing_symbols in src/roam/index/indexer.py:551,1192 omitted line_end from all_symbol_rows. The resolver's le > 0 guard always failed → syms[0] fallback for every method ref. Repo-wide 95% mis-attribution reduction (2715 → 147 silently corrupted edges). W742 (phantom import-edge mis-attribution): _closest_symbol in src/roam/index/relations.py:488-496,848-898 was returning syms[0] for kind='import' refs that couldn't be resolved to an enclosing symbol — manufacturing 18 phantom import edges on _format_count alone plus 6 transitive side-effects. Fix: optional kind parameter on _closest_symbol; returns None for kind=='import' instead of falling through. New invariant test in tests/test_relations.py:167-225. Together, every detector that reads edges (taint / side_effects / critique / dead / smells / vibe-check / ai-rot) now consumes correct edges. Suppression family migrated through Phase C-1 (W691 schema unification → W692 dataclass src/roam/policy/suppression_v2.py 312-line _SuppressionBase + 3 variants → W722 Phase B-a smells typed companion → W723 Phase B-b finding_suppress + sarif → W736 Phase C-1a sarif _load_suppressions migrated → W737 Phase C-1b cmd_smells.load_smells_suppressions migrated → W738 Phase C-1c BAILED on cmd_triage for three malformed-input divergences but MIGRATED suppression.save_suppression internal dedup; new tests/test_w738_suppression_wire_format.py 8/8 pass). Pattern-3a third-axis close: W596 src/roam/output/confidence.py (15 sites) + W631 src/roam/output/risk.py (4-tier critical/high/medium/low + moderate→medium alias) + W640 cmd_alerts._LEVEL_ORDER folded into severity_rank() + W648 AST audit ZERO slipped tables + W649 cmd_alerts UPPER → lowercase per W547. Bare-except discipline shipped end-to-end: W660 _find_workspace_root narrowed; W661 catalog/detectors.py production loop fail-loud; W662 AST drift-guard; W665 / W677 / W678 / W679 / W740 narrowed individual sites (allowlist 9 → 3); W707 found + sealed a REAL BUG (_serialize_suppressions dead-code on the first flag); W746 extended _GUARDED_DIRS 4 → 12 to cover substrate modules. Wheel-bundling thread closed: W664 __init__.py package-data drift-guard CAUGHT A LIVE W643-class bug on first run (roam.languages.extractors was missing its __init__.py); W668 as_file() audit; W624 migrated mcp_server.py:14569 mcp --card handler to importlib.resources; W642 / W643 removed dead triple-parent fallback. MCP wrapper P0 batch: W670 P0.1 fixed roam_plan file_path alias regression by moving _wrap_with_alias_normalization BEFORE the preset filter; W671 P0.2 added _INLINE_RESPONSE_TOOLS frozenset cold-start exemption for roam_catalog; W672 P0.3 synced scripts/sync_surface_counts --write to live 238/231/224; W606 added AST lint for canonical-positional collision; W607 decomposed _wrap_with_alias_normalization into 3 helpers; W636 collapsed _sync vs _async wrapper closure duplication; W695 added --card CLI smoke test. Smell catalog reached 20 detectors: W601 switch-statement (7 findings; surfaced REAL refactor candidate _create_extractor 23-arm switch); W602 temporal-coupling (10 findings); W603 magic-numbers (495 findings); W604 boolean-parameter (0 findings); W605 comment-density TODO/FIXME/XXX/HACK; W639 cross-detector empty-corpus smoke; W646 W699 DOGFOOD refactor _format_count (cluster finding led to W708 + W742); W647 symbol-centric temporal-coupling rollup (surfaced W708 false positive); W650 block-comment TODO/FIXME C/Java/JS; W705 unified _CommentSyntax (21 languages); W720 comment-density extended to hcl + apex. Hygiene drive-bys: W682 README CLI table evidence-oscal row; W683 .gitattributes 13 → 49 lines (* text=auto eol=lf + 26 binary extensions); W685 README CLI table header auto-count assertion; W689 .editorconfig mirroring .gitattributes; W690 dev-doc note for pytest on Windows; W697 extras-gate on README CLI command-count check; W699 DOGFOOD refactor _format_count cluster finding; W702 _DEPRECATED_COMMANDS AST-literal contract test. Hash-stability mandate held 31/31 byte-identical across every source wave.

Fixed — CRITICAL edge-attribution family CLOSED (W708 call-edge mis-attribution + W742 phantom import-edge mis-attribution)

Fixed — Suppression family migrated through Phase C-1 (W722 / W723 / W736 / W737 / W738)

Changed — Pattern-3a canonical-rank consolidation third axis CLOSED + alerts surface canonicalised (W596 / W631 / W640 / W648 / W649)

Changed — Bare-except discipline structurally CLOSED (W660 / W662 / W665 / W677 / W678 / W679 / W746)

Changed — Wheel-bundling thread CLOSED (W624 / W642 / W643 / W664 / W668)

Changed — MCP wrapper P0 batch + refactor chain (W606 / W607 / W636 / W670 / W671 / W672 / W695)

Added — Smell catalog reached 20 detectors (W601-W605 / W639 / W646 / W647 / W650 / W699 / W705 / W720)

Added — Hygiene drive-bys (W682 / W683 / W685 / W689 / W690 / W697 / W702)

Changed — W755 consolidation pass

> Seventeen-completion batch folded in behind the W698-CONSOLIDATE consolidation. The headline is W708's critical silent-bug seal: Python call-edge mis-attribution was silently corrupting every detector that reads edges (taint, side_effects, critique, dead, smells, vibe-check, ai-rot). Root cause was indexer.py:551 + indexer.py:1192 omitting line_end from all_symbol_rows, which collapsed per-call resolution to per-symbol. Post-fix: _format_count non-import edges drop 78 → 0 on roam-code; repo-wide 2715 → 147 (95% reduction). Validation in flight (W709). The W647 symbol-centric temporal-coupling rollup (10 pair → 5 cluster findings; cmd_health.health clustered) is what surfaced the false positive driving the W708 fix. Suppression family phased close: W691 unified .roam/suppressions.json schema between finding_suppress + sarif readers (closing the W676-found latent bug); W692 Phase A shipped the discriminated-union dataclass at src/roam/policy/suppression_v2.py; W722 Phase B-a added the load_smells_suppressions_typed() companion (KindSymbolSuppression internal); W693 added cross-loader compat across 5 suppression substrates. W723 Phase B-b in flight; W724 Phase C queued. Comment-density smell expansion: W705 unified _CommentSyntax record taking coverage 14 → 21 languages; W720 extended to hcl + apex; W650 extended detection to / / block comments (C-family + CSS). Hygiene wave: W689 added .editorconfig (23 lines) mirroring .gitattributes EOL/charset/binary rules; W685 pinned README CLI table header to (all 231) with auto-count + test_readme_cli_command_count_matches_source; W695 added --card CLI smoke (2 tests); W697 added README CLI test extras-gate (auto-allowlist from cli._DEPRECATED_COMMANDS); W702 added _DEPRECATED_COMMANDS AST-literal contract test. Small cleanups: W642 removed triple-parent fallback from mcp --card handler (-19 LOC); W649 canonicalised cmd_alerts UPPER → lower per W547 contract; W707 removed _serialize_suppressions dead code + regression test. Hash-stability 31/31 byte-identical held across every source wave.

Fixed — CRITICAL silent call-edge mis-attribution (W708) + suppression schema unified (W691) + cmd_alerts canonical lowercase (W649)

Added — Suppression family phased close + symbol-centric temporal-coupling rollup + comment-density 14→21 languages + hygiene wave

Changed — Small cleanups (W707 dead-code + W698 consolidation)

> P0 user-flagged regression batch fixed (W670/W671/W672/W682) + bare-except discipline shipped end-to-end (W653 real bug + W661/W662 fail-loud guards + W665/W677 narrowing, allowlist 9→4) + wheel-bundling discipline COMPLETE (W664 LIVE BUG caught + W668 as_file audit + W642 triple-parent removed) + smell-suppression substrate (W658) + CRITICAL latent bug surfaced in .roam/suppressions.json (W676 → W691 in flight) + W646 eat-our-own-dogfood (W601 cleared) + W683/W685 hygiene (W670 / W671 / W672 / W682 / W683 / W685 / W642 / W646 / W653 / W661 / W662 / W664 / W665 / W668 / W658 / W676 / W677 batch, 2026-05-15). > Sixteen-completion batch folded in behind the W657-CONSOLIDATE consolidation. The headline is the P0 batch closure (4 user-flagged regressions sealed): W670 P0.1 moved _wrap_with_alias_normalization before the preset filter so roam_plan no longer drops the file_path alias on filtered presets; W671 P0.2 added a _INLINE_RESPONSE_TOOLS frozenset that exempts roam_catalog from the auto-handle wrapper so the cold-start catalog call returns inline instead of through a never-completed handle; W672 P0.3 synced 8 files to the live 238 commands · 231 canonical · 224 mcp tools counts (the auto-derive path via dev/build_readme_counts.py --apply); W682 P0.3-followup added the evidence-oscal row to the README CLI table. Bare-except discipline shipped end-to-end: W653 fixed a REAL bug in run_all_detectors — bare-except was swallowing NameError/ImportError/AttributeError/TypeError classifying-bugs as if they were per-detector failures; now they propagate as RuntimeError and only sqlite3.Error is swallowed+logged; W662 added an AST drift-guard banning bare-except in detector modules (9 sites grandfathered + 10/10 tests pass); W661 applied the fail-loud discipline to the catalog/detectors production loop (8 new tests); W665 narrowed 3 bare-except sites (allowlist 9→6); W677 narrowed 2 more (allowlist 6→4). Wheel-bundling discipline COMPLETE: W664 added a __init__.py package-data drift-guard that CAUGHT A LIVE W643-class bug on first run (roam.languages.extractors had a missing __init__.py); W668 audited as_file() callers + sealed the pattern with 4 fixes + a drift-guard; W642 removed the triple-parent fallback from the mcp --card handler (-19 LOC; W624 already migrated the resolution to importlib.resources so the fallback was dead code). Smell-suppression substrate landed (W658): 225-line module + 17 tests for .roam/smells.suppress.yml. CRITICAL latent bug surfaced (W676): suppression-parser audit found 4 parsers (not 3) with incompatible schemas in .roam/suppressions.json — two readers consume the same file with different shapes, so suppressions silently apply to one detector and not the other (W691 in flight to seal). W646 eat-our-own-dogfood: refactored _create_extractor from 105 → 17 lines via a _LANGUAGE_EXTRACTORS dispatch dict — cleared roam's own W601 finding on itself (first time the smell catalog caught a true positive on roam-code AND the refactor sealed it within the same week). W683 / W685 hygiene: .gitattributes extended 13 → 49 lines (eol=lf + 26 binary rules); README CLI table header pinned to "(all 231)" matching the 231-canonical count. Hash-stability 31/31 byte-identical held across every source wave.

Added — Smell-suppression substrate + bare-except AST drift-guard + __init__.py wheel drift-guard (W658/W662/W664 batch)

Fixed — P0 user-flagged regression batch (W670/W671/W672/W682) + bare-except real bug (W653) + wheel-bundling LIVE bug (W664 finding)

Research/added — Suppression-parser audit (W676) + W646 dogfood refactor + W683/W685 hygiene

Changed — (ADD) W698 consolidation pass

> Pattern-3a vocabulary cluster GENUINELY STRUCTURALLY CLOSED across ALL THREE rank axes (severity + confidence + risk) + smell-detector catalog reached 20 detectors (was 15) + _wrap_with_alias_normalization refactor+dedup chain + cross-detector empty-corpus smoke (W607 / W624 / W631 / W601 / W602 / W640 / W605 / W648 / W639 / W636 batch, 2026-05-15). > Nine-completion batch folded in behind the W635-CONSOLIDATE consolidation. The headline is Pattern-3a GENUINELY closed end-to-end: W631 introduces the third canonical axis src/roam/output/risk.py::risk_rank() and migrates 2 sites (cmd_migration_plan + cmd_path_coverage), pairing with W564 severity-rank + W596 confidence-rank to canonicalize ALL THREE rank axes (severity + confidence + risk); W648 AST audit returned ZERO slipped rank tables across the entire src tree — Pattern-3a is structurally closed for real, not just-in-name; W640 folded cmd_alerts._LEVEL_ORDER into severity_rank() via -severity_rank(lowercase) and broadened the drift-guard regex /sever/ → /sever|level_order/. Smell catalog reached 20 detectors (was 15 at session start): W370c 5-smell expansion COMPLETE — W601 (switch-statement, 7 findings; surfaced REAL refactor candidate _create_extractor 23-arm switch), W602 (temporal-coupling, 10 findings; surfaced cli↔_run_roam_inprocess 34-commit top coupling), W603 (magic-numbers), W604 (boolean-parameter), W605 (comment-density TODO/FIXME/XXX/HACK; roam-code CLEAN at max 0.49% rate). LAW-4 anchor sets bumped 92→93 / 109→110 to accommodate comment-density terminals. _wrap_with_alias_normalization refactor + dedup chain complete: W607 decomposed the 130-line _wrap_with_alias_normalization into 3 helpers (_collect_alias_candidates, _build_merged_signature, _build_merged_annotations; 130→50 lines + 7 unit tests; 2960 focused tests pass); W636 collapsed the sync/async wrapper closure duplication via shared _prepare_kwargs helper + branched closure (33→28 lines + duplicate-body anti-pattern eliminated). Pairs with W595 (param-ordering seal) + W606 (canonical-positional collision lint) to give the wrapper its end-to-end discipline. Cross-detector empty-corpus smoke (W639) guards 54 detectors (20 smells + 34 algo + 2 floor counts; 56+115+17+31 = 219 tests) against silent import errors after concurrent merges — catches the W601/W602-style regression class at PR time. W624 migrated the mcp --card handler at mcp_server.py:14593-14624 to importlib.resources.files("roam") / "mcp-server-card.json" with as_file() — completes the importlib.resources discipline thread (10+31+140 tests pass). Hash-stability 31/31 byte-identical held across every source wave.

Added — Risk-rank canonical helper + 5 new smell detectors + cross-detector empty-corpus smoke (W631/W601/W602/W603/W604/W605/W639 batch)

Changed — _wrap_with_alias_normalization refactor + dedup + importlib.resources migration + alerts level_order fold (W607/W636/W624/W640/W648 batch)

> Pattern-3a vocabulary cluster STRUCTURALLY CLOSED across BOTH rank axes (severity + confidence) + smell-detector catalog reached ZERO placeholder stubs + wheel-bundling discipline complete + fragile-path sweep + AST drift-guards across the board (W596 / W594 / W588 / W577 / W570 / W564 / W515 / W370c batch, 2026-05-15). > Sixteen-completion batch folded in behind the W600-CONSOLIDATE consolidation. The headline is structural across BOTH rank axes: W596 completes Pattern-3a confidence-rank consolidation by migrating 15 sites to the canonical src/roam/output/confidence.py::confidence_level_rank() helper (561 tests pass), pairing with W564's prior 10-site severity-rank migration to close the Pattern-3a vocabulary cluster end-to-end. Combined with W547 (severity vocab) + W518 (control-mapping vocab) + W512 (edge-kinds) + W565+W566 (severity helpers), drift-guard discipline now canonicalizes through 6 modules + 6 AST lint suites — every Pattern-3a vocabulary cluster surfaced in the dogfood corpus flows through canonical modules with AST drift-guards. Third rank axis (risk) flagged as W631 follow-up. Smell detector catalog reached ZERO placeholder stubs (W370c): shipped 2 detectors (refused-bequest 2 findings + primitive-obsession 144 findings) and scoped the remaining stubs into 5 W370c-followup waves (W601-W605) for new smell kinds. Fragile-path harness gotcha closed end-to-end: W587 (10 sites) + W594 (18 sites, 47 → 29 remaining) swept 28 of 57 fragile-path test sites to the canonical tests/_helpers/repo_root.py helper; W588 added an AST drift-guard for the Path(__file__).parents[N] pattern with fail-loud _PRE_W594_PENDING allowlist (47 entries — corrected upward from the 27 estimate by the W588 inventory pass); W606 added an AST lint for canonical-positional collision catching the pre-W595 crash class at PR time (4 new tests). Wheel-bundling discipline COMPLETE: W554 customer SHIPPING BUG fixed + W570 drift-guard + W577 CI wheel-smoke job (3 steps: build wheel + install fresh venv + run drift-guard from /tmp) + W610 extended to taint_rules + languages.extractors + mcp-server-card (3 new test classes, 6 new tests) — closes prior 2 silent-empty bugs (12.12.1 taint rules + 12.12.2 Jenkinsfile) across 5 package-data surfaces. Pattern 1 variant D family CLOSED at the CLI boundary (W573 NO-OP investigation confirmed only 1 production call site for ChangeEvidence.from_canonical_json exists). Leasing-system parity completed: W447 + W448 (bundled) added the pr-replay info marker on missing leases dir + read_lease(warnings_out=...) kwarg. Severity helpers landed: W565 + W566 (bundled) added severity_to_confidence_level() + severity_breakdown() helpers (5 call-sites migrated, 248 tests). Drift-guard parsing seal: W515 parses python-version from the live workflow before drift compare, sealing the false-positive class on CI version bumps (139 tests). Doc sweep: W569 swept 9 stale templates/audit-report/ path refs across 8 src/dev files + 1 test docstring (111 tests). Small cleanups: W591-bundle W584 / W497 / W500 bailed as already-done; W501 audit comments added to 4 test files. W573 NO-OP investigation: only 1 production call site for ChangeEvidence.from_canonical_json exists — Pattern 1 variant D family fully sealed at CLI boundary. Hash-stability 31/31 byte-identical held across every source wave.

Research/added — Pattern-3a confidence-rank canonicalization + smell detectors + AST drift-guards + wheel-smoke CI (W370c/W515/W564/W577/W588/W596/W606/W610 batch)

Fixed — Fragile-path sweep continues + Pattern 1 variant D CLI boundary closed + leasing parity (W447/W448/W573/W587/W594 batch)

Changed — Severity helpers + doc sweep + small cleanups (W565/W566/W569/W591-bundle batch — carry from W591 batch)

> Pattern-3a severity-rank consolidation STRUCTURALLY CLOSED + fragile-path sweep + leasing parity + git-helper consolidation + Pattern 1 variant D CLI-boundary close (W540-W591 batch, 2026-05-15). > Nine-completion batch folded in behind the W578-CONSOLIDATE consolidation. The headline is structural: W564 completes the Pattern-3a severity-rank consolidation by migrating 10 sites to the canonical severity_rank() helper alongside W512 (edge-kinds) + W518 (control-mapping vocab) + W547 (severity vocab) + W565+W566 (severity helpers) — every Pattern-3a vocabulary cluster surfaced in the dogfood corpus now flows through canonical modules with AST drift-guards. 14 confidence-rank tables flagged as the next Pattern-3a target (W596 queued). 460 + 31 tests pass. Pattern 1 variant D CLI boundary CLOSED: W573 investigation confirmed only 1 production call site for ChangeEvidence.from_canonical_json* exists (the one W561 already migrated) — the variant D family is fully sealed at the CLI boundary. Leasing-system parity completed: W447 + W448 (bundled) added the pr-replay info marker on missing leases dir under migration / autonomous_pr modes + the roam.leases.store.read_lease(warnings_out=...) kwarg — Pattern-2 always-emit discipline now covers list_leases (W425) + read_lease (W448) + pr-replay info-marker (W447) end-to-end. 137 + 31 tests pass. Git-helper subprocess discipline: W540 consolidated _git_fingerprint + _git_commit_sha helpers; pr-bundle init now shells out to git rev-parse HEAD ONCE instead of TWICE per invocation. 105 + 31 tests pass. Severity helpers landed: W565 + W566 (bundled) added severity_to_confidence_level() + severity_breakdown() to _severity.py with 5 call-sites migrated. 248 tests pass. Fragile-path sweep (W587): 10 test sites migrated to the new tests/_helpers/repo_root.py helper — 37 → 27 fragile-path sites remain (W594 queued for the remainder). Surfaced a real bug: _wrap_with_alias_normalization param-ordering breaks test_surface_consistency (W595 in flight). Small cleanups (W591-bundle): W584 / W497 / W500 bailed as already-done; W501 audit comments added to 4 test files. 81 tests pass. Doc sweep (W569): 9 stale templates/audit-report/ path refs swept across 8 src/dev files + 1 test docstring + 1 fixture-regen command. 111 tests pass. Hash-stability 31/31 byte-identical held across every source wave.

Changed — Pattern-3a severity-rank canonicalization + severity helpers + git-helper consolidation (W540/W564/W565/W566 batch)

Fixed — Leasing parity + Pattern 1 variant D CLI boundary closed (W447/W448/W573 batch)

Changed — Fragile-path sweep + small cleanups + doc sweep (W569/W587/W591 batch)

> SHIPPING BUG FIXED + Pattern 1 variant D disclosure + canonical severity vocab + ChangeEvidence round-trip pipeline + OSCAL persistent artifacts + package-data drift-guard (W520-W570, 2026-05-15). > Ten-completion batch folded in behind the W549 consolidation. The headline is a customer-facing shipping bug fix: W554 moved templates/audit-report/control-mapping.yaml into src/roam/templates/audit_report/ + added the pyproject.toml package-data entry — pip install roam-code users could not previously run roam ci-setup --with-oscal or roam evidence-oscal against their own projects because the control-mapping YAML was not bundled in the wheel. Lookup migrated to importlib.resources. Verified end-to-end via fresh tmp venv wheel install (109 tests pass). Pattern 1 variant D dropped_enum_rows disclosure lands across the AR envelope: W534 introduced ChangeEvidence.from_canonical_json(text, *, strict=False) with closed-enum validation — 31 golden fixtures round-trip BYTE-IDENTICAL with content hashes preserved (forgiving projection mode); W561 added from_canonical_json_with_drops() classmethod that surfaces dropped enum rows + partial_success: true on the envelope (LAW-4 anchored on rows terminal); W559 wired from_canonical_json into the cmd_evidence_oscal AR path with a --strict flag (hybrid Mapping|ChangeEvidence signature). Forgiving-projection AND fail-loud discipline now both available end-to-end (W465 golden fixture stays byte-identical). Canonical severity vocabulary in src/roam/output/_severity.py (W547 + W548 bundled): SEVERITY_LEVELS / SEVERITY_ALIASES / normalize_severity / to_sarif_level / validate_severity + AST drift-guard — closes the Pattern 3a severity-vocabulary divergence across SARIF emitters. OSCAL persistent artifacts (W535): roam ci-setup --with-oscal now materializes .roam/oscal/control-mapping.json + stub-assessment-plan.json with deterministic UUIDv5 + SHA-256-seeded timestamps — the FedRAMP continuous-assessment evidence pattern. SLSA SRC-L3 commit_sha chain CLOSED: W520 added the cga-sibling emit_cga_vsa_sibling commit_sha fallback — belt-and-suspenders complement to W509 — completing the producer-W521 + collector-W509 + cga-sibling-W520 three-path chain (all three fall back to git rev-parse HEAD). Package-data wheel-bundling discipline: W570 added tests/test_package_data_wheel_drift.py drift-guard pinning roam.templates.audit_report + roam.templates.ci package-data entries. Closes the recurring "feature works in src but broken on pip install" surface (the W554-class bug). Version-skew + hash-stability hygiene: W557 rolled server.json + mcp-server-card.json 12.50→13.0 via dev/build_readme_counts.py --apply; W563 normalizes auto-derived fields before hashing in the card-hash test so count/version bumps stay invisible while preserving the R17 tampering guard for other fields. Hash-stability 31/31 byte-identical held across every source wave.

Research/added — ChangeEvidence round-trip + canonical severity + OSCAL persistence + cga commit_sha (W520/W534/W535/W547/W548/W559/W561/W563/W570 batch)

Fixed — SHIPPING BUG closed (W554) + version skew (W557)

> W493 BUG FAMILY STRUCTURALLY CLOSED + THREE more silent no-ops sealed + OSCAL pipeline end-to-end + OWASP labels integrity + SLSA SRC-L3 commit_sha parity (W506-W533, 2026-05-15). > Ten-wave batch closed behind the W516 docs consolidation. The headline is structural: W512 introduces src/roam/db/edge_kinds.py + a 16-test drift-guard lint that migrates 12 read-sites to canonical helpers and structurally seals the W493/W499/W511/W524 edge-kind bug family — future inline kind IN queries fail at lint time. Three more long-latent silent no-ops sealed this batch: (1) W511 fixed side_effects.py:497 edge-kind union (production impact 13/14,949 → 14,949/14,949 edges matched — the FOURTH silent no-op in the W493 family); (2) W524-bundle hunt found 7,534 missing import edges in cmd_hover.py (the largest single edge-kind no-op in the family by 3 orders of magnitude — hover output had been blind to imports since launch), plus +13 references in cmd_risk.py and defensive plumbing in cmd_patterns.py; (3) W531 caught SARIF severity=error silently downgrading to "note" since launch — GitHub Code Scanning + Microsoft Defender were not flagging taint findings as errors for any consumer that ingested roam SARIF, ever. OSCAL pipeline fully shipped end-to-end: W465 added Assessment Results emission via roam evidence-oscal --kind assessment-results (with auto-synthesized stub Assessment Plan per the FedRAMP continuous-assessment pattern). With W464 already in flight, roam evidence-oscal now covers both v1.2 models. Claim-integrity batch on OWASP labels: W533-bundle (W530+W531+W532) corrected the OWASP A05 → A03 mislabel on java_sqli + python_ssti and brought owasp_top10 coverage from 3/22 → 22/22 rules, plumbed via W492/W453 into TaintRule / TaintFinding / findings.evidence_json / SARIF tags[]. SLSA SRC-L3 commit_sha parity completed: W509 added the emit-time git rev-parse HEAD fallback (restoring cga sibling parity surfaced by W498), and W521 stamped commit_sha producer-side at pr-bundle init so the W509 fallback becomes belt-and-suspenders. Framework-vocab consolidation: W518 collapsed scattered allowlists into src/roam/evidence/control_mapping_vocab.py (9 framework slugs + 9 titles + 3 pass-conditions + 7 surfaces) with drift-guard. SLSA control-map entries shipped: W506 landed the 3 missing SRC-L2/L3 entries + iso_42001 → iso_iec_42001 rename across 5 files in lockstep — claim-integrity hygiene now matches the W451/W471/W472 SRC-L3 pipeline. Hash-stability 31/31 byte-identical held across every source wave.

Fixed — THREE long-latent silent no-ops + W493 family structurally sealed (W506-W533 batch)

Added — OSCAL Assessment Results + OWASP plumbing + framework-vocab module + SLSA entries + producer-side commit_sha (W506-W533 batch)

> TWO long-latent silent no-ops sealed + SLSA SRC-L3 evidence-pipeline polish + closed-enum lints + taint trio closure (W375-W515, 2026-05-15). > The wave between the W491 consolidation and this one shipped twelve > threads in parallel. The headline is two critical-correctness > fixes that landed back-to-back: (1) W493 fixed > propagate_taint's kind='calls' query against writers that emit > kind='call' — the taint DFS had been a NO-OP since inception, all > 76 production findings stuck at chain_length=1. Three read-side > sites repaired (taint.py:491, cmd_dead.py:1565, dataflow.py:329); > 4 stale tests that asserted the no-op behavior flipped to assert > the real contract; 31/31 byte-identical golden hashes hold, 292 > tests pass; W441's 607-finding projection now stands for the > production roam-code corpus. (2) W499 fixed > critique/checks.py:399 — the impact gate was matching 0/14,949 > caller edges (COMPLETE NO-OP); post-fix surfaces 5 high-severity > findings on roam-code itself. PRs touching open_db / > json_envelope / to_json / invoke_cli / path now correctly > exit-5 in --ci mode. (3) W375 closed the W372-research first-ship > taint-rule trio (after W373 python-ssti + W374 java-sqli): > java-deserialization rule pack at > src/roam/security/taint_rules/java_deserialization.yaml > (T-X04 / CWE-502 / A08:2021; 15 sources / 12 sinks / 13 sanitizers, > qualified_only: true). (4) W486 extracted the shared > src/roam/attest/emit_vsa.py helper (339 lines); cmd_pr_bundle > and cmd_cga collapse to 9-line + 24-line delegations > respectively. 143/143 tests pass. (5) W498 added the > end-to-end VSA parity test in tests/test_attest_vsa.py:661+ > (TestVsaCliParity) — found real drift: pr-bundle drops > commit_sha when --no-auto-collect; cga falls back to > git rev-parse HEAD. Spawned W509 fix (now in flight). > (6) W428 shipped the 5 W360-research crosswalk YAML entries > (NIST AI 600-1 + SP 800-218A): AI600_VALUE_CHAIN_PROVENANCE, > AI600_STOP_BUILD_AUTHORITY, SSDF218A_CODE_PROVENANCE, > SSDF218A_CODE_REVIEW_AI_OUTPUT, SSDF218A_DEVELOPER_AUTHORIZATION. > CAISI held to H2 2026. W506 in flight to add the missing SLSA > entries — claim-integrity hygiene per the agentic-assurance > "supports evidence for" lint. (7) W505-bundle shipped 3 > closed-enum lints (W502 source_framework / W503 pass_condition / > W504 surface); 19+31 tests pass. (8) W482 added a roam > doctor advisory check that compares the local > .github/workflows/roam.yml against the canonical CI template; > chose advisory-check over a standalone command for low-friction > surfacing. Real-world signal: roam-code's own roam.yml has > drifted from template (26 vs 28 lines) — surfaced on the > dogfooded doctor run. 9 new tests + 137/137 focused pass. > (9) W485 verdict was MEASUREMENT DRIFT, not regression — > the W408 baseline was a 17k-symbol corpus; current roam-code is > 23.6k symbols / 29.9k edges / 3.8k files (+39% / +76% / 7x). > Effects_taint scaled 67.6s → 87.4s; relative dominance held > 48% → 50.5%. (10) W488 auditing pass: the rest of the > test_taint_*.py corpus for stale bare-name assertions came up > CLEAN — W479 caught the only offender; 128+31 tests pass. > (11) W441 BAILED with a high-impact find — it was the > investigation that surfaced the W493 kind='calls' vs kind='call' > typo (real wallclock when fed correct data: 0.06s; W433-research's > 35s prediction was based on stale code). Spawned the critical > W493 fix. (12) W491-CONSOLIDATE — itself (folded inline in > the previous Unreleased entry).

Added — taint trio close + SLSA polish + crosswalk + closed-enum lints + advisory check (W515 batch)

Fixed — TWO long-latent silent no-ops (W493 + W499)

Changed — perf-measurement reframe (W485) + investigation closures (W488)

Research / planning — W515 batch

> SLSA SRC-L3 evidence pipeline + Pattern-3b consolidation + taint precision discipline + perf ground-truth (W430-W491, 2026-05-15). > The wave between the W466 consolidation and this one shipped eight > threads in parallel. (1) SLSA SRC-L3 evidence pipeline end-to-end > — W451 wired the SRC-L3 lift through new src/roam/attest/vsa.py > (369 lines) and pr-bundle emit --slsa-l3 --sign --keyless; > cosign_sign_statement was already predicate-agnostic so no engine > changes were needed. W471 auto-triggered the SRC-L3 VSA emit in CI > via new template src/roam/templates/ci/slsa-src-l3.yml and the > --with-slsa-l3 flag on cmd_ci_setup, closing Gap A from W358-research. > W472 added roam cga emit --also-vsa (110-line _emit_vsa_sibling > helper) threading --sign --keyless. 23+144 / 15+31+23 / 3+43+26+43+31 > tests pass across the trio. (2) Pattern-3b consolidation closes — > W430 renamed targetsymbol on 9 MCP wrappers (prepare_change, > trace, affected_tests, annotate_symbol, get_annotations, generate_plan, > get_invariants, why_fail, metrics); _PRE_W332_EXEMPT dropped 14 → 5. > Legacy target still resolves via alias with summary.alias_warnings > for back-compat. 3014 tests pass. (3) Taint engine precision > discipline reinforcedW467 fixed the W454 qualified_only bug > (root cause was a compound A+C: bare names matched via exact > qualified_name = ? on Python top-level AND via suffix LIKE '%.{name}' > on Java wrappers; fix: bare names become no-ops under qualified_only=true). > java-sqli YAML scrubbed. 125+31 tests pass. W479 audited the > remaining 22 taint YAMLs — zero offending rules — added a load-time > warnings.warn lint + 7-test hygiene guard, and drive-by-fixed an NTFS > case-collision bug (closes the open W468 + W477 items). (4) Perf > optimization ground-truthW440 shipped the Phase 2 → Phase 5 > source-cache handoff: effects_taint moved from 91.0s → 84.7s = 7% > reduction (modest vs the 15-30s predicted by W433-research). 216 tests > pass. W441 + W485 follow-ons queued. (5) Detector FP-rate methodology > researchW470-research (dev/DETECTOR-FP-RATE-METHODOLOGY-2026-05-15.md) <!-- PHANTOM 2026-05-18: declared SHIPPED but memo absent from disk. Regenerate from BACKLOG/test-fixture breadcrumbs before next release. --> > scoped FP-rate measurement for 3 first-to-measure detectors (smells 3047 > findings, vibe-check 831, taint). Surprise finding: OWASP Benchmark is > community-rejected — task-specific real-codebase corpora are now > preferred. Docs-only consolidation in this batch (W491); hash-stability > mandate held across all source waves.

Added — SLSA SRC-L3 wire-up + 9-wrapper rename + CI auto-trigger + cga --also-vsa (W491 batch)

Fixed — W491 batch

Research / planning — W491 batch

> Standards crosswalk research + taint rule pack v1 + shallow git default + auto-generated MCP tool table + qualified-name rule flag (W405-W466, 2026-05-15). > The wave between the W436 consolidation and this one ran twelve > threads in parallel across five families. (1) Standards > crosswalk research trilogyW358-research (SLSA v1.2 > Source Track positioning, dev/SLSA-V12-POSITIONING-2026-05-15.md) <!-- PHANTOM 2026-05-18: declared SHIPPED but memo absent from disk. Regenerate from BACKLOG/test-fixture breadcrumbs before next release. --> > found that roam de-facto covers SRC-L2 today, and the surprise > finding is that SRC-L3 lift is one wavecosign_sign_statement() > at attest/cga.py:495-594 is already implemented; new wave W451 > queued. W359-research (OSCAL v1.2 Control Mapping, > dev/OSCAL-V12-CONTROL-MAPPING-2026-05-15.md) found that OSCAL v1.2 > <!-- PHANTOM 2026-05-18: declared SHIPPED but memo absent from disk. Regenerate from BACKLOG/test-fixture breadcrumbs before next release. --> > shipped a 7th model (Control Mapping) which is the zero-prereq > first emission for per-run evidence; new waves W464/W465 queued. > W360-research (already landed in W436 batch) feeds W428. > (2) Taint rule pack v1W373 (python-ssti, T-X01, CWE-94; > engine already supports qualified-name matching; 7 new + 45+39 > existing tests pass) + W374 (java-sqli, CWE-89; same recall-limited > precision profile as java-fileupload because engine lacks Java > qualified-name resolution; 7 new + 44+31 existing tests pass) + > W454 (per-rule qualified_only flag for taint engine; java-sqli > opts in; 29+60 focused tests pass). Drive-bys W452-W463 queued. > (3) Perf — shallow git default on first indexW405 shipped > the 365-day shallow window via _DEFAULT_SINCE in git_stats.py + > --full-history opt-out + ROAM_GIT_SINCE env var; _first_index() > gate preserves existing deep indexes; 30+31+115 focused tests pass. > Drive-bys W437/W438/W439 queued. > (4) Documentation count drift sealedW443 added README > coverage for 4 untracked CLI commands (evidence-diff, > evidence-doctor, llm-smells, findings); the > test_readme_covers_all_canonical_cli_commands drift guard now > passes. W449 auto-generated the README MCP tool table via a > new surface_counts.mcp_tool_descriptions() helper — 74 missing > tools added and the core preset count corrected (25 → 57). 4/4 + > 16/16 + 8/8 + 31/31 test suites pass. Drive-bys W449-W463 queued. > (5) Dedup + small-cleanup bundleW432 removed five > oracle wrappers that W306 had already added (symbol_exists, > route_exists, is_test_only, is_reachable_from_entry, is_clone_of); > 228 → 223 decorations now match the CLAUDE.md headline. New AST > duplicate-name CI lint via surface_counts.mcp_tool_decorations() > helper. W429 packaged the W422 deprecate-permit-wrapper + > W425 lease warnings_out + W426 constitution-unparseable warning > as a single small-cleanup bundle; 204/204 tests pass; 31/31 hash > stability byte-identical. Drive-bys W443/W444/W445 and W446/W447/W448 > queued. (6) Perf research scopingW433-research > (dev/EFFECTS-TAINT-PERF-RESEARCH-2026-05-15.md) scoped three > <!-- PHANTOM 2026-05-18: declared SHIPPED but memo absent from disk. Regenerate from BACKLOG/test-fixture breadcrumbs before next release. --> > optimization candidates for the W408 finding: (C) double-parse I/O > elimination 15-30s zero risk; (B) function-summary memoization > 35→5s; (A) file-signature cache warm-reindex 0s. Surprise > finding: roam has TWO independent taint engines — > analysis/taint.py for Phase 5 (indexer-side) vs > security/taint_engine.py for the roam taint command — > consolidating them is a deeper structural play. Docs-only > consolidation in this batch (W466); hash-stability mandate held > across all source waves.

> Permit unification + Pattern-3b extension + llm-smells v1.1 + phase-timing reality check (W347-W436, 2026-05-15). > The wave between the W418 consolidation and this one ran nine > threads in parallel. W377-batch closed six permit-persist > red-team gaps (W377-W382) surfaced by W349; 31/31 golden hashes > remained byte-identical and 163 focused tests pass. W383 > unified pr-bundle and pr-replay permit readers behind a > single canonical roam.permits.store.load_permits_from_disk > reader, with two drive-bys captured as W421/W422. W347 extended > the Pattern-3b parameter-alias normalization to add file_path → > path (the prefix-pattern cluster was deliberately bailed on; 3 > drive-bys queued as W430/W431/W432). W415b shipped llm-smells > v1.1.0 — five new CHEAP detectors (missing_timeout, > missing_max_retries, no_system_message, no_retry_backoff, > call_in_loop); 36/36 pass; package version bumped 1.0.0 → 1.1.0; > 3 drive-bys queued as W415c/W415d/W427. W408 instrumented > per-phase timing in roam doctor and the real-data finding is the > headline of this wave: effects_taint consumes 48% of indexer > wallclock (67.6s of 139.6s on roam-code itself), which > invalidates the PageRank-first ranking in the W395-followup > perf memo; new wave W433 is queued to target effects_taint > first (drive-bys W434/W435 follow). W421 investigation bailed > after finding constitution + lease gatherers already delegate to > canonical readers (119/119 baseline tests pass; 2 drive-bys as > W425/W426). Research-only artifacts: W372-research OWASP 2026 > taint rule pack (3 first-ship rules W373/W374/W375), > W395-followup Phase 4-7 perf research (W407 reclassified to > VALIDATE — Louvain cache already implemented; top 3 new perf waves > W423/W424 + W433), and W360-research standards crosswalk > additions (5 NIST AI 600-1 + SP 800-218A YAML entries; CAISI held > until H2 2026; implementation as W428).

> MCP wrapper backfill near-complete + detector strengthening Round 2 + perf research + llm-smells design (W303-W418, 2026-05-15). > The wave between the W398 consolidation and this one moved Wave29 > wrapper backfill from 38 → 16 missing through three consecutive > sub-waves (W303 test-surface +5, W304 agent-OS daily flow +10, > W305 reports/audit +11) — 26 wrappers added in total; W306 will > drop the remaining count to ~3. Detector strengthening Round 2 > landed against the W368 BEHIND list: W370 smells empty-catch > (469 findings), W370b duplicate-conditionals (149 findings), and > W371 vibe-check modular-mirage + boilerplate-inflation > (163 + 499 findings, informational and score-preserving) — 1,280 > new findings on roam-code itself. The pitch refresh trilogy > sharpened the top-of-funnel surfaces: W390 (README + landing + > docs hero), W393 (11 secondary surfaces), W396 > (src/roam/mcp-server-card.json mirror; hash-pin updated). > Pattern-1 family Round 3 sealed cmd_owner (W362) as the third > CLI-side "exit-0 + structured envelope" fix after W327 and W324. > Permit red-team added 19 W198-edge-case tests (W349) with 6 > drive-by gaps queued as W377-W382. Structural cleanups: W346 > module-scope fixture cut test_json_contracts.py runtime ~28x; > W364 extracted _redact_secrets to src/roam/security/redact.py > (load-bearing for W363); W345 finished the W198 doc > cross-reference sweep; W319 / W348 / W352 / W403 / W412 closed > count-convention + warning-hygiene + Python-version drift + > asyncio config + stale-3.9-comment cleanup gaps; W367 refreshed > the TEAM-MCP-AUTHORITY-PRODUCT facade. Two new sonnet+web research > artifacts framed the next strategic axes: W395 perf benchmarking > (roam positioned MEDIUM — 5-20x faster than CodeQL with comparable > depth; 5 optimization sub-waves W405-W408 plus W404 scheduled) and > W402-research llm-smells pattern catalog (14 patterns; v1 = 11 > CHEAP+MODERATE — the first production-grade multi-provider > linter for openai/anthropic/google/litellm/langchain anti-patterns).

> Pitch sharpening + Pattern 1 family Round 3 + detector strengthening + permit red-teaming (W303-W398, 2026-05-15). > Four threads landed in parallel between the W375 consolidation and > the W398 one. (1) Pitch surfaces refreshed to lead with "pre-change > gates + post-change evidence": W390 swept README + landing index + > docs index hero copy, W393 extended the sweep across 11 other > surfaces (pricing / press / trust / governance / etc.). (2) > Pattern-1 family Round 3: W362 fixed cmd_owner to emit a > structured envelope on exit 0 instead of empty stdout — the third > CLI-side "exit-0 + structured envelope" fix after W327 and W324. > (3) Detector stub fills landed against the W368 BEHIND list: W370 > shipped smells empty-catch (469 findings on roam-code itself) and > W370b shipped duplicate-conditionals (149 findings; long-tail > distribution). (4) Permit red-team test surface added at W349 (19 > permit-persist tests) with 6 drive-by gaps queued as W377-W382. > Wave29 MCP wrapper backfill continued: W303 closed the test-surface > cluster (5 wrappers, 38 → 33). Structural support: W345 finished > the W198 doc cross-reference sweep, W364 extracted _redact_secrets > to a shared module (load-bearing for W363). One new sonnet+web > research artifact: W385 ecosystem positioning audit (7 tools > surveyed; 5 COMPLEMENTARY / 2 COMPETITIVE / 0 SUBSTITUTE on > agentic-assurance).

> No silent gaps milestone (W256-W261, 2026-05-14). The pr-replay > pipeline on the roam-code workspace itself now reports 7 complete > + 1 partial + 0 missing out of 8 evidence questions. Q8 > (accepted_risks / approvals) is the last open question and it is > now an explicit producer_not_available redaction-marker entry on > the packet, not a silent absence. The honest-banner classifier > (STRONG / PARTIAL / INSUFFICIENT) consumes this in the PR Replay > Markdown + JSON output, so the assurance surface can no longer > overclaim coverage it does not have.

> Pattern-1 family A/B/C/D + MCP wrapper backfill (W296-W302, 2026-05-15). > Variants A/B/C of the empty-stdout / structured-failure / hang > family are now codified as a canonical CLAUDE.md spec with 5 > invariants and external citations; Variant D (silent success on > degraded resolution) was added after W324 surfaced the gap. The > Wave29 MCP-wrapper backfill closed four clusters in four > consecutive sub-waves — exploration W299 (+9), architecture W300 > (+10), health W301 (+10), refactoring W302 (+9) — moving the > missing-wrapper count 75 → 67 → 57 → 47 → 38. Five sonnet+web > research planning artifacts landed alongside the implementation > work (Pattern-1 family audit, Pattern 3+6 audit, MCP > state-mutating patterns, standards currency audit, detector > competitive audit) — each is a forward-looking roadmap, not > shipped code.

Added

> Three closures in three waves milestone (W261 + W266 + W268, > 2026-05-14). Three of the W252 producer-coverage matrix's > top-three under-served fields closed in three consecutive waves: > Q8 silent-gap → explicit producer_not_available redaction > marker (W261); environment axis 1 producer → N (W266); > authority axis 1 producer → 5 ref kinds (W268). All three > followed the same discipline — Pattern-2 always-emit + > delegate-not-move — so existing v0/v1 content-hash contracts > survived intact, and consumers see the new evidence the moment > a producer ships without a schema migration or flag flip.

> W252 producer coverage matrix closure cycle complete > (2026-05-14). Four waves collapsed the matrix's top-three > under-served fields in a single cycle: environment (W266), > authority (W268), policy (W267), and synth-bundle parity (W272). > Real-world roam pr-replay HEAD~5..HEAD on the roam-code > workspace itself now carries: 3 actor_refs / 3 authority_refs / > 3 environment_refs / 6 policy_decisions / 492 context_refs / > 11 artifacts (audit-trail manifest + 10 CGA predicates); the > executable 8-question audit reports complete=7 / partial=1 / > missing=0 (W261 forward-compatible no-silent-gaps shape). The > last open producer-side gap (Q8 accepted_risks / approvals) > is the only path to lift the partial → complete and is queued > P1 as W247.

> Provenance trilogy closure (W290 + W292 + W293, 2026-05-15). > Every evidence dimension (actor_refs, authority_refs, > policy_decisions, approvals) now carries > extra["provenance"] stamped at ingestion sites via > provenance_label() from the W282 closed vocabulary (10 > sources, no new strings). Ingestion-point stamping discipline > preserves dataclass schema cleanliness — every wave landed > through call-site additions, never through default-value > changes. W294 stabilized the authority axis by populating > AuthorityRef.source distinctly per category and wiring > writer-side run-ledger fields (mode_to/mode_from/ > permit_id/lease_id/approval_id/rule_id) so the W292 > harvester finds real corroboration instead of only > run-meta.mode. W288-followup added a per-artifact advisory > warning to complement W280's enforced packet-level budget. 31/31 > golden content_hashes byte-identical across the trilogy + > stabilization waves.

Changed

> Evidence pipeline hardening batch (W279-W287 + W247a, > 2026-05-14). Typed PolicyDecision drift detection + packet > size budget + trust-tier surface + corroboration-based > promotion + provenance vocab + generated limitations + > producer-site version stamping + GitHub review parser. > Real-world roam evidence doctor on the roam-code workspace > itself: VERDICT PASS (was WARN via 3 unknown-tier > pseudo-actors; W285 promoted them to local_env via real > HMAC-verified run-ledger corroboration, with W286 confirming > the bracket against an insufficient-banner fixture). Real > packets sit at ~96 KB against a 256 KiB budget. Provenance > vocabulary landed clean; producer-side wiring is deferred to > W290+ so the call-site churn is decoupled from the vocab > freeze.

Research / planning

Added — taint rule pack v1 + shallow git default + auto-generated MCP table + oracle wrapper dedup + qualified-name flag (W466 batch)

Research / planning — W466 batch

Added — permit unification + Pattern-3b extension + llm-smells v1.1 + phase timing

Bailed

Fixed

Security

Performance

Deprecated

[13.0] — 2026-05-13

Deprecated

Substrate evolution

Mode enforcement (staged rollout, opt-in)

Real-world feedback fixes

New commands

Drift-guard infrastructure

Renamed

Schema

Round 10/11 — 9-agent parallel hardening pass

Nine Opus agents in two parallel waves closed the gaps surfaced by the R9 cross-rechecks. All landed in the working tree; none lost when the host PC died mid-flight.

R10 wave — 5 agents

R11 wave — 4 agents

Round verification: 624 targeted tests across the 15 touched test files pass on first run after the crash recovery — zero seam issues across the 3 files that received edits from 2 agents (mcp_server.py, test_validate_plan.py, test_n1_fixes.py).

Round 9 — code work + 5-pass cross-recheck

Recheck-driven fixes (R9 cross-pass)

Correctness

Performance

Agent / MCP DX

Onboarding (roam init)

roam doctor

CLI surface

Security tier-2

Architecture substrate

Tests

Conventions

---

Earlier in the [Unreleased] window

Site / docs

Fixed

Tests

Internal

[12.50] - 2026-05-09

Release notes

12.50 is the first PyPI release after a stretch of locally-bumped versions (12.48, 12.49) that never published. The wheel ships every change from the [12.48] and [12.49] entries below plus the new work described here. Going forward, releases are deliberate weekly / bi-weekly cuts (see CONTRIBUTING.md); [Unreleased] accumulates work between cuts instead of triggering a version bump per change.

Build / packaging

Site / trust

Action.yml

Documentation site consolidated to roam-code.com only

GitHub Pages disabled on the repo on 2026-05-08. Previously the docs were dual-hosted at cranot.github.io/roam-code/* (GitHub Pages serving docs/site/) and at roam-code.com/docs/ (Cloudflare Pages serving templates/distribution/landing-page/docs/). Drift between the two copies was a persistent source of count / content inconsistencies.

Net effect: one canonical docs surface (roam-code.com/docs/), no silent drift between two hosts, simpler CI.

stale-refs — operations-grade upgrades

Adds repo config, in-toto attestations, LSP code actions, LSP cross-file rename, roam audit integration, monorepo support, and a wider external-link-check option set.

External link checking — five new flags

LSP server — Quick Fix + workspace-wide rename + watcher

LLM enricher — ranked candidates + observability

Repo config + init helper

CI / supply-chain

Auto-fix — opt-in MEDIUM tier + Windows lock awareness

Discoverability

Watch mode — micro-optimisation

[12.49] - 2026-05-08

stale-refs — five major capability additions

A single release that pushes the v12.48 stale-refs intelligence layer from "audit tool" to "always-on safety net" across five orthogonal delivery channels: agent (LLM enrichment), live editor (LSP), live terminal (watch), persistent CI gate (baseline), and external web links (HTTP check).

Phase 1 — enrich_with_llm (MCP)

The roam_stale_refs MCP tool now accepts enrich_with_llm=True. When set AND ROAM_AI_ENABLED=1 AND the client supports MCP sampling, unresolved findings (NONE / LOW confidence) are batched into one Context.sample call. The agent's own LLM suggests the most likely intended path from the candidate set; suggestions return as confidence=MEDIUM with source="llm-sampling" and never auto-fix. This closes the deterministic-providers coverage gap where restructure-class drift (docs/cold-outreach.mddocs/sales/outreach-templates.md) loses character similarity but is trivial for an LLM. New CLI flag --with-candidates exposes summary.repo_paths_sample so the enricher can give the LLM context. New summary.llm_hints_added reports the count.

Phase 2 — --watch continuous mode

roam stale-refs --watch runs an initial scan, then polls the repo for file changes (mtime-based, no watchdog dep). On each cycle prints only newly-introduced and newly-resolved findings as a delta block timestamped with the current time. Composes with all other flags (--ignore, --diff, --no-anchors, etc.) so a long-running session in one terminal pane shows exactly the doc breakage you're about to commit. --watch-interval controls poll cadence (default 1.5s with ~30% debounce on detected change).

Phase 3 — persistent baseline (--baseline-save / --baseline-from)

Save a deterministic JSON snapshot of current findings via --baseline-save FILE; on subsequent scans --baseline-from FILE filters to only new findings since that snapshot. Different from --diff (git-based) and --ignore (glob-based) — the baseline is a frozen finding-set acknowledgment. Schema: roam-stale-refs-baseline-v1 with sorted records of "<target>|<file>:<line>:<kind>". Composes with --gate so CI fails ONLY on regression, never on legacy debt. Summary fields: baseline_size, baseline_filtered_out, baseline_saved_to.

Phase 4 — --check-external HTTP link checker

roam stale-refs --check-external extends the scan to http(s):// URLs via concurrent HEAD/GET requests (stdlib urllib — no extra dependency). Findings surface with kind=external alongside the local ones; SARIF gets a new stale-refs/external rule. Configurable via --external-timeout (default 5s) and --external-concurrency (default 8, capped at 32). Off by default to keep the scan local + offline; opt-in by users who want full link hygiene. Tries HEAD first, falls back to GET on 4xx (some CDNs reject HEAD).

Phase 5 — roam lsp editor integration

A minimal Language Server Protocol implementation, hand-rolled over JSON-RPC stdio (no extra dep). Handles initialize, textDocument/{ didOpen, didChange, didSave }, shutdown, exit, and publishes textDocument/publishDiagnostics with proper range / severity / source. Wire into VS Code, Neovim, JetBrains, Helix, Sublime as a custom LSP server pointing at roam lsp. Squiggly underlines on dangling links and missing anchors appear as you type. The server walks the project once at startup to populate basename_idx and anchor_cache, then per-keystroke scans cost only the regex pass on the buffer's content. didSave refreshes the workspace index in case the saved file added/removed referenceable paths.

Surface count

Tests

tests/test_stale_refs.py grows to 126 tests (+33 from v12.48). New classes: TestStaleRefsWithCandidates, TestLlmEnrichParser (5 robustness cases for LLM response parsing), TestStaleRefsWatchHelpers (3 watch-loop unit tests), TestStaleRefsBaseline (3 save/filter/gate flow tests), TestStaleRefsCheckExternal (3 URL extraction + classification tests), TestStaleRefsLsp (4 protocol handshake + URI-conversion tests including a real subprocess spawn). Round-1 and round-2 hardening passes added: TestStaleRefsDomainThrottle (2), TestStaleRefsLlmHintValidation (3), TestStaleRefsLspIntegration (2 full-handshake tests), TestStaleRefsBaselineLineTolerant (3 line-shift + v1 backwards-compat tests), TestStaleRefsExternalDedup (1), TestStaleRefsWatchHelpersComposition (1), TestStaleRefsLspIncrementalFlow (1).

Round-1 + round-2 hardening

After the initial five-phase ship, two further audit passes surfaced real correctness issues:

Round-3 — comprehensive multi-angle test coverage (+17 tests)

A third audit pass added 17 dedicated tests covering previously under-tested angles. No new bugs surfaced — but several class-of- behavior contracts now have explicit guarantees:

Total stale-refs test count: 143 (76 → 88 → 93 → 113 → 126 → 143 across the polish iterations). All green; ruff clean.

Round-4 — composition guards, debugability, discoverability (+17 tests)

The fourth round took a wide-and-deep audit pass focused on what the user experiences when things compose — not features in isolation.

Real bug fixed:

UsageErrors added (foot-guns prevented):

Debuggability:

Discoverability:

Test coverage: 17 more tests covering the composition guards (3), the 6 LLM skip-reason paths (7), dynamic LSP version (2), and recipe followup contracts (5).

Total stale-refs test count: 160 (76 → 88 → 93 → 113 → 126 → 143 → 160 across all polish iterations). 248 across all touched suites. All green; ruff clean.

[12.48] - 2026-05-08

roam stale-refs — dangling file-reference scanner

Index-free scanner that finds markdown links, HTML href/src attributes, and backtick file paths whose target no longer exists on disk. Closes the gap between symbol-graph commands (uses, impact, refs) — which only see indexed call/import edges — and prose mentions of file paths in docs, READMEs, and YAML/JSON configs. Pure filesystem operation; runs in any git directory regardless of whether roam index has been built.

Detection surface

False-positive filters

Reporting

Discoverability

Internals

Surface count

Polish iterations after the initial v12.48 ship

roam pr-replay — productised PR Replay report

Wraps roam postmortem with tier-aware buyer-facing framing, an aggregated detector-class breakdown, and a markdown narrative ready to hand to a prospect. The productised version of "would Roam have caught my last 30 incidents?" — the qualifier on the path to a Roam Review subscription.

Three tiers, one engine

Report shape

Tooling

Landing-page integration

[12.47] - 2026-05-08

Documentation cleanup + anti-drift CI gates

A maintenance release that aligns documentation across surfaces, scrubs shorthand from source comments and template files, renames a deliverable, and lands four CI gates that prevent regression.

Anti-drift CI gates (new)

Source comments + tests

Documentation + product naming

History rewrites

pyproject.toml

[12.46] - 2026-05-07

CI fix — ruff lint cleanup

Hotfix after 12.45. The ruff format-check passed in 12.45 but the ruff LINT pass (separate) flagged 7 errors across the new files:

Applied ruff check --fix --unsafe-fixes. Whitespace + dead-code removal only; all tests still green.

[12.45] - 2026-05-07

CI fix — ruff format on newly-added files

Hotfix after 12.44. The 9 net-new files added in 12.43-12.44 (capability.py, cmd_compare.py, cmd_skill_generate.py, sarif.py edits, plus 4 test files) were not run through ruff format before commit. CI's lint job ran ruff format --check and rejected.

Per the project's known-learning ("Ruff format check in CI: Always run ruff format on new files before committing"), this should have been caught locally. The hotfix runs the formatter and lands the whitespace-only changes. No behavior change.

[12.44] - 2026-05-07

CI fix — register the two new detectors in the catalog

Hotfix after 12.43. The two new async detectors (async-fire-and-forget-task, async-nested-run) were registered in the detector dispatch table but missing from the catalog/tasks.py CATALOG dict. test_math.py::test_detector_registry_covers_catalog caught the mismatch on all 5 Python versions.

Adds full catalog entries for both new tasks: name, category, kind, and the two-way ranked-solutions list that the rest of the algo infrastructure expects. Bumps test_math.py's expected-task count 32 -> 34. No behavior change to the detectors themselves.

[12.43] - 2026-05-07

Major: Capability Registry + 4 new commands + landing-page launch

This release lands Capability Registry and bundles a substantial polish round. Companion to the launch of the new commercial landing page at https://roam-code.com.

New commands (4)

New detectors (2)

SARIF output enrichment

Rule packs

Documentation

Landing page (https://roam-code.com)

Major rework over 5 audit-and-fix passes after the domain went live on 2026-05-07:

Surface counts

Deferred to follow-up releases: Dart Tier-1 extractor, parallel parse for monorepos, LLM-augmented MCP tool, why-slow CLI via runtime traces, open-issues sweep, GraphQL bridge, incremental MCP hot-reload.

[12.42] - 2026-05-06

CI fix — landscape.json self-row version stamp

Hotfix after 12.41. The 12.41 release bumped pyproject + MCP cards + competitor_site_data but missed docs/site/data/landscape.json's self-row, which tests/test_doc_consistency.py::test_landscape_json_self_row_version_matches guards (major.minor must match pyproject). Bumped 12.40 → 12.42 in that file. No behavior change.

[12.41] - 2026-05-06

CI fix — README surface consistency for Phase 0 commands

Hotfix release after 12.40. The README's command listing did not yet include permit, postmortem, and article-12-check, which broke tests/test_readme_surface_consistency.py::test_readme_covers_all_canonical_cli_commands on all 5 Python versions in the matrix. Added one-line entries for each of the three new commands in the canonical command table. No behavior change; documentation-only fix to restore CI green.

[12.40] - 2026-05-06

New commands + commercial landing page

After 8 CI iterations restoring the matrix to green (12.31 → 12.39), this release lands three new CLI commands and a starter landing page for the hosted product surface.

New commands

Commercial landing page (starter)

New directory templates/distribution/landing-page/ with:

the new domain recommendation.

Surface counts

Tests

[12.39] - 2026-05-06

Polish — exhaustive bugbear sweep + B904 cleanup

After the 12.38 PyYAML pin landed CI green for the first time in 8 iterations, ran an exhaustive bug-hunt sweep across 8 categories (fallback parsers, hardcoded counts, B033/B023, future-annotations, test-side YAML deps, module-level state survival, lazy-import asserts, B904 missing-from-clauses).

Findings + fixes

Findings without fixes (low risk, future-watch)

[12.38] - 2026-05-06

Clean fix — PyYAML pinned in [dev] extras (kills the recurring Python 3.9 CI red)

12.31 → 12.37 was seven consecutive bugfix releases — every one caused by some divergence between PyYAML and the in-tree _parse_simple_yaml / _emit_simple_yaml fallback on Python 3.9. Root cause: fastmcp (which transitively pulls in PyYAML) is gated on python_version >= '3.10' in [dev] extras, so Python 3.9 CI ran without PyYAML. Every test asserting PyYAML-equivalent behaviour surfaced a missing capability in the fallback. Each fix narrowed the gap; the gap kept reappearing.

This release pins PyYAML in [dev] so the test matrix has a consistent reference parser on every Python version. The fallback parser/emitter stays in tree (production users without PyYAML still get a working roam, just with the documented-shape coverage we built across 12.33-12.37).

Why not make PyYAML a hard dep?

Considered. PyYAML is one of the most-installed Python packages globally so the cost would be small. But:

So: hard dep stays a future option; for now the test-matrix fix is sufficient.

What this changes

[12.37] - 2026-05-06

Bugfix release — roam rules-validate --fix write-back works without PyYAML (Python 3.9 CI red, seventh iteration)

12.36 fixed the parser. 12.37 fixes the round-trip: roam rules-validate --fix rewrites severity: blockseverity: BLOCK in memory, then calls yaml.safe_dump to write back. On Python 3.9 (no PyYAML) the import raised, the write-back was skipped, and the file stayed at block — failing test_cli_fix_mode_writes_back_to_file which asserts 'BLOCK' in file_contents.

Bugfix

Sweep status

The pattern's been: every test that touches the parse OR emit YAML path on Python 3.9 surfaces a different missing capability in the fallback. Now we have parse + emit fully round-trippable without PyYAML for the documented rules.yml shape.

[12.36] - 2026-05-06

Bugfix release — bracket-balance check ignores quoted strings (Python 3.9 CI red, sixth iteration)

12.35's bracket-balance malformed-YAML check counted brackets inside quoted strings. Community rule files (e.g. rules/community/dataflow/DF-005-php-cross-fn-sqli.yaml) have legitimate sources: ["$_GET[", "$_POST[", "$_REQUEST["] shapes — each string contains a [ with no matching ], but PyYAML happily parses them. The fallback's naive s.count("[") flagged this as malformed and test_rules_community_pack.py::test_community_pack_has_1000_plus_valid_rules went red on Python 3.9 (PyYAML missing).

Bugfix

Sweep status

Each fix surfaces another edge case in the same _parse_simple_yaml path — the cost of having a fallback parser that diverges from PyYAML on shapes the test suite exercises. Long-term: the right move is to either ship PyYAML as a hard dependency for the rules-engine subsystem or to import a tiny vendored YAML parser. For now: targeted shape-by- shape fixes, validated by CI matrix on Python 3.9.

Also lands

[12.35] - 2026-05-06

Bugfix release — _parse_simple_yaml malformed-input + top-level-list (Python 3.9 CI red)

Fifth iteration on the same CI matrix. 12.34's list-of-dicts fix made the fallback parser TOO permissive — tests/test_pr_analyze_edge_cases.py::test_load_rules_yaml_handles_non_yaml_file expects malformed YAML to surface a warning, but the fallback parsed "this is not: valid: yaml: at all: [" as a non-failing dict and no warning was emitted. Same for test_load_rules_yaml_top_level_not_dict where PyYAML returns a list and the loader warns "must be a mapping", but the fallback returned {} silently.

Bugfixes (_parse_simple_yaml)

Sweep done while waiting on CI

While CI 12.34 was running, swept all 25 yaml.safe_load call sites across 7 files — every one already has an except ImportError fallback. So the parser bug surfaced in just one place (cmd_pr_analyze._parse_rules_data) but the fix lands in the shared roam.rules.engine._parse_simple_yaml so all callers benefit.

[12.34] - 2026-05-06

Bugfix release — _parse_simple_yaml list-of-dicts (Python 3.9 CI red)

12.33 fixed three test files but missed a fourth red on Python 3.9: tests/test_pr_analyze.py::test_load_rules_yaml_simple. The test's fixture YAML is a list-of-dicts (rules: [- id: ...]) — the canonical shape of .roam/rules.yml. Without PyYAML, the fallback parser at roam.rules.engine._parse_simple_yaml only handled flat key-value shapes and inline lists, so the result on 3.9 was an empty single-dict and the assertion len(rules) == 1 failed.

Bugfix

How this slipped through (running tally)

The pattern: each fix covered the reported failure but didn't sweep for siblings. Added a release-checklist note in 12.33 about the triple-grep for _CORE_TOOLS. Adding now: also grep "yaml.safe_load" src/ to spot every fallback path that needs _parse_simple_yaml coverage of advanced YAML shapes.

[12.33] - 2026-05-06

Bugfix release — third stale assertion + bugbear lint sweep

12.32 fixed two stale _CORE_TOOLS == ... assertions but missed a third one in the same file. CI on 12.32 stayed red. Fixed here, plus a bugbear-lint sweep (B033 duplicate set items, B023 closure-over- loop-variable) that surfaced four real micro-bugs.

Bugfixes

How this slipped through

The _CORE_TOOLS count appears in three assertions across two test files. 12.30 updated one. 12.32 caught the second on CI red. 12.33 caught the third on CI red. Lesson: a grep -rn "tool_count" sweep at every surface bump would have caught all three at once.

[12.32] - 2026-05-06

Bugfix release — CI green-bar restore + Z-phase polish

12.31 went out with two stale tests (drift from the hosted-product core-tools list landed in 12.27/12.28) plus a Python-3.9 environment gap. Both fixed here.

Bugfixes

New detectors (3)

Smarter outputs

Catalog

[12.31] - 2026-05-06

Major release — 90-phase polish + smarter pass

This release lands a multi-session polish run touching almost every detector and command in the codebase. Headline gains: 2.7× faster roam math (5.5s → 2.07s on roam-code itself), 3.2× faster roam --help (1.24s → 0.39s warm), 2.3× faster roam health (3.3s → 1.45s warm), +6 new algorithm detectors, +3 framework profiles, 69% of findings now carry structured matched_patterns explainability blocks, and a 40-entry regression-FP corpus so the wins can't quietly come back.

New detectors (6)

New framework profiles (3) + auto-detection

roam math --framework FRAMEWORK now bundles five profiles. New ones:

Auto-detection (autodetect_framework_profile) sniffs: requirements.txt / pyproject.toml for django, Gemfile for rails, package.json @nestjs/core for nestjs (alongside the existing vue3 + laravel cases). The (auto) tag in the verdict line surfaces when a profile was auto-selected so it isn't invisible.

Performance

Smarter outputs / explainability

Smarter classification (FP fixes)

Cleaner

Tests + corpus

Schema

[12.30] - 2026-05-06

Detector quality audit follow-ups

A second dogfood pass of roam math / weather / auth-gaps / migration-safety / over-fetch against the a Vue 3 + Laravel app Vue 3 + Laravel multi-tenant codebase surfaced five fresh false-positive classes that the 12.28/12.29 rounds didn't catch. All five are fixed here, each with regression-corpus fixtures so they can't quietly come back. Web search confirmed the patterns we're recognising are the canonical Laravel + TypeScript idioms (parent-controller $this->middleware('auth') is the pre-Laravel base-class auth pattern; PostgreSQL SQLSTATE 42P07 / MySQL 1050 are the standard "table already exists" idempotency codes in stancl/tenancy multi-tenant migrations).

roam weather / hotspot ranking (E1)

roam math / I/O-in-loop walker (E3)

roam auth-gaps / base-class inheritance (E2)

roam migration-safety / Schema::create messaging (E4)

roam over-fetch / config-shaping wrappers (E5)

Tests

Deferred

[12.29] - 2026-05-06

Detector quality deferred items

The 12.28 round shipped 14 FP fixes plus a suppression mechanism. Customer feedback flagged seven gaps the rushed round didn't cover; this release ships them as a coherent batch.

Math / IO / N+1 detector

Tests / regression discipline

PR comment renderer

Suppression workflow

Refactor

[12.28] - 2026-05-06

Detector quality round () — false-positive fixes

User feedback after running roam math / over-fetch / missing-index / auth-gaps on a multi-tenant Laravel + Vue 3 codebase surfaced systematic FP patterns. This release ships fixes for all of them.

Math (roam math / algo)

Missing-index (roam missing-index)

Auth-gaps (roam auth-gaps)

Over-fetch (roam over-fetch)

Suppression mechanism (M7)

Three layered paths for marking a finding as a known FP:

Suppressed findings stay in the JSON envelope under finding["suppressed"] = {source, reason} instead of being silently dropped — consumers can detect over-suppression. Text output filters them by default. Verdict line (M14) now reflects "N unsuppressed candidates surfaced; M suppressed via …" when any suppression fires.

Added — new command

Surface counts

Tests

[12.27] - 2026-05-06

Added — round-5 polish + dogfood

15 small-to-medium improvements driven by the round-5 task capture. No new top-level commands; all flags + helpers + content additions.

Fixed

Internal

Surface counts

Tests

[12.26.1] - 2026-05-06

Added

Fixed

Internal

[12.26] - 2026-05-06

Added — Roam Agent Review + Cloud Lite engines (hosted-product layer)

8 new commands ship the Roam Agent Review and Roam Cloud Lite product engines plus the EU AI Act Article 12 audit-trail toolkit.

Added — pr-analyze hardening

Added — MCP tool surface

Added — distribution surface

Added — shared helpers

Changed

Internal — cognitive complexity reductions

Self-dogfood with roam complexity surfaced 2 CRITICAL functions (cc ≥ 99) and 3 HIGH functions in v2 modules. All five refactored:

Surface counts

Tests

[12.25] - 2026-05-05

CI fix: backport QueryCursor for tree-sitter < 0.24 (Python 3.9 lane). The 12.24 narrowing got past the install layer; the next breakage was an unconditional from tree_sitter import QueryCursor in roam/languages/query_engine.py. QueryCursor was added to the Python bindings in tree-sitter 0.24, but Python 3.9 pins to tree-sitter 0.23.x (newer versions require ≥ 3.10).

This was also a real runtime bug — any Python 3.9 user installing roam-code from PyPI would have hit ImportError the first time the indexer hit a YAML-extractor language.

Fix: try: from tree_sitter import QueryCursor; except ImportError: falls back to a thin shim that delegates .matches() and .captures() to the underlying Query object — the old tree-sitter 0.23 API exposes the same methods on Query directly.

[12.24] - 2026-05-05

CI fix: narrow the fastmcp dev-dep marker so Python 3.9 stops failing to install. fastmcp >= 2.0 requires Python >= 3.10, which means the unconditional "fastmcp>=2.0" shipped in 12.23 broke the 3.9 lane:

` ERROR: Could not find a version that satisfies the requirement fastmcp>=2.0; extra == "dev" (from roam-code[dev]) `

Marker is now "fastmcp>=2.0; python_version >= '3.10'" so 3.9 skips the install entirely. The MCP-runtime test already guards on _HAS_FASTMCP (12.23) so 3.9 simply skips that single assertion.

[12.23] - 2026-05-05

CI bring-up: surface fastmcp dependency for the MCP-runtime tests.

After 12.22 fixed the indexer-order bug, CI exposed the next layer of the saga: test_pass93_mcp_wrappers_registered asserted "roam_why_fail" in _TOOL_METADATA but CI installed only the [dev] extras (no fastmcp). Without fastmcp the @_tool(...) decorator becomes a no-op and _TOOL_METADATA stays empty — the test had been masked by the earlier blockers since 12.17.

Fix:

1. Add fastmcp>=2.0 to the [dev] extra so CI exercises the actual MCP registration path. 2. Defense-in-depth: test_pass93_mcp_wrappers_registered is now @pytest.mark.skipif(not _HAS_FASTMCP, ...) so it stops gating environments that intentionally skip the optional extra.

[12.22] - 2026-05-05

Indexer pipeline ordering fix + two CI test-isolation fixes.

Indexer ordering — late-edge resolvers now run BEFORE graph metrics

cached build_symbol_graph(conn) keyed on id(conn). The indexer pipeline ran graph metrics first, then the django-post, pytest-fixture, and registry-dispatch resolvers — which add edges to the DB AFTER the graph was already cached. When a follow-up command opened a new readonly connection that happened to be assigned the same id() (Python reuses freed addresses), the cache returned the stale graph from before those late edges existed.

The user-visible symptom: roam impact <fixture> showed "no dependents" when there were transitively-depending tests, because the pytest_fixture_dep edges weren't in the cached graph the impact command read.

Fix:

1. Reorder the indexer to run all late-edge resolvers BEFORE _compute_graph_metrics. The graph metrics now reflect every edge, not a stale subset. 2. Clear the graph cache at the end of Indexer().run() so any subsequent reader builds fresh — belt-and-suspenders against future late-resolver additions.

CI test isolation

[12.21] - 2026-05-05

Ten quality + reliability passes (rounds 111-120). Three real CI bugs fixed (CI has been red since 12.17), three more cognitive-complexity splits, a new audit-report template, and a latent graph-cache leak fix from .

cmd_impact JSON contract

CI failure at 3.9 + 3.12. When roam impact finds the symbol in the index but NOT in the dependency graph, the path emitted plain text on stdout, breaking --json consumers. Wrapped in a proper envelope (summary.in_graph: False) with the same hint surfaced in the tip field.

health --gate exit code

CI failure at 3.13. The test asserted health_min: 100 is unreachably high but a tiny fixture project scores exactly 100, and the comparison is score >= h_min so 100 ≥ 100 passes. Switched the test to health_min: 999 to make the threshold genuinely unreachable.

MCP sampling test

CI failure at 3.11. added the ROAM_AI_ENABLED opt-in gate; the existing test never set the env var, so sampling returned None on CI. Updated the success-path test to set ROAM_AI_ENABLED=1 and added a default-OFF assertion test.

_compute_reachability split

cc 150 (deepest nesting in repo at depth 8) → ~10. Decomposed into _node_match_keys, _matches_dep, _trace_entry_reach, _build_norm_lookup, _record_match. Orchestrator stays under 10 LOC of branching.

poll_loop split

cc 154 with 17 params at cmd_watch.py:457. Pulled per-event helpers (_need_force, _scan_disk_changes, _label_webhook_events, _refresh_tracked_after_reindex, _run_guardian_step) keeping the public signature stable so callers and tests are unaffected.

tests for 5 untested commands

Added behavioural tests for py-modern (had 0 references), graph-stats, mcp-status, pre-commit, exit-codes (each had 1 registration-only reference). 9 new tests.

ROAM_QUERY_TIMEOUT_S coverage

shipped an opt-in SQLite progress handler. Zero test coverage existed. Added 4 tests exercising no-env / invalid / zero / and a tiny-budget interrupt that should fire OperationalError.

format_table budget threading (cmd_context)

20 format_table() calls across 5 files lacked budget=. Added _table_budget(data) helper and threaded the global --budget through cmd_context's data dict. Wired into the two highest-volume call sites (callers + callees lists).

audit-report Markdown template

P1.2 strategic blocker. Built a 9-section, 185-line template at docs/audit_report_template.md with placeholders for every roam audit --json field. Bridges the gap between the engine and the deliverable artifact paying customers see.

_build_agent_descriptors split + graph-cache fix

Top remaining complexity offender: _build_agent_descriptors cc=161 in graph/partition.py. Decomposed into 6 small helpers (_node_partition_index, _fetch_node_metadata, _file_majority_owners, _read_only_files_for, _boundary_contracts, _cluster_label_for).

Also fixed a latent state-leak bug from 's graph-builder memoization: the cache was keyed on id(conn) and Python reuses id values across short-lived objects, so partition tests running after orchestrate tests in the same process saw a stale graph from a closed connection. Added an autouse fixture in conftest.py that calls clear_graph_cache() between tests.

Surface counts unchanged: 178 CLI commands, 128 MCP tools, 41 core.

[12.20] - 2026-05-05

Ten quality-focused passes (rounds 101-110). No new commands; this round is pure cleanup and hardening based on what roam debt, roam health, and roam complexity reported about the codebase itself.

QueryEngine._extract_symbols_from_pattern cc 198 → ~10

Single most-complex function in the codebase. Decomposed into four small helpers (_find_name_node, _decode_capture, _resolve_kotlin_class_kind, _build_symbol_from_def) leaving the orchestrator at ~10 cognitive complexity. All 194 extractor tests pass.

_render_single_text cc 189 → smaller orchestrator

Pulled the per-symbol header rendering (async badge, idiom badge, paren-aware decorators block) out of cmd_context._render_single_text into _render_async_badge / _render_idiom_badge / _render_decorators_block. The paren-aware split now correctly handles parametrize("a,b", [...]) decorators that previously got mangled by naive comma-splitting.

delete 4 truly-dead exports

roam dead aggregated 78 SAFE entries but most are decorator- registered MCP tools (false positives the analyzer can't see through). Of the 16 non-decorator candidates, 4 had only self- references and were genuinely dead: removed write_site_payload (competitor_site_data), detect_string_format_old (python_idioms — disabled by return findings on first iteration), structured_click_exception (output/errors).

break the cli ↔ cmd_doctor cycle

roam health flagged exactly one actionable cycle: cmd_doctor imported _COMMANDS from cli, while cli's command registry referenced cmd_doctor. Static graph saw it as a 2-edge cycle. Replaced from roam.cli import _COMMANDS with importlib.import_module("roam.cli") so the only edge is runtime-only — cycle eliminated, doctor still validates every registered command.

health 80 → 88 via utility-path classifier fix

The god-component classifier was labeling architectural hubs (cli Click root, _run_roam MCP dispatch, build_symbol_graph) as actionable when they're SUPPOSED to have high fan-in. Added graph/ mcp_extras/ languages/ to _UTILITY_PATH_PATTERNS and cli.py mcp_server.py file_roles.py to _UTILITY_FILE_PATTERNS. Health score jumped 80 → 88 (+8 pts).

_analyze_dataflow_dead cc 160 → ~10

Top of the danger-zone list (cmd_dead.py: 3362 churn × cc=24.6 × fan-in=8 = score 1.68). The 200-line _analyze_dataflow_dead mega-function split into _table_exists, _read_caller_line, _is_return_captured, _detect_unused_returns, _parse_param_names, _detect_dead_param_chains, _detect_side_effect_only. Orchestrator stays under 10. All 48 dead-code tests pass.

observability hook extended

covered cmd_metrics + cmd_describe (20 sites). adds cmd_understand (4 sites), metrics_history (9 sites), and the remaining nested patterns. ROAM_VERBOSE=1 now surfaces 31 swallow points; remaining ~40 are in less-touched commands and will land incrementally.

second --json bypass sweep

Probed every command with an unknown-symbol input. Caught one new bypass: roam test-map UnknownXYZ printed plain text "Not found: ..." instead of a JSON envelope. Fixed.

TODO/FIXME audit (no real debt)

22 markers in source; all 22 are intentional — cmd_test_scaffold.py writes "TODO" strings as scaffold output (17 sites) and cmd_vibe_check.py detects TODO patterns in user code (5 sites). No actual debt. Decision logged here.

orphan-imports false-positive sweep

orphan-imports was flagging roam.telemetry and roam.observability as internal_typo because the indexed file table was older than these modules. _indexed_python_modules now also walks src/ directly so modules added between index runs aren't false-flagged. 30 false internal-typo entries eliminated; total orphan count 164 → 143.

[12.19] - 2026-05-05

Ten quality-focused passes (rounds 91-100). Net new surface: 1 CLI command (audit — Priority 1 strategic blocker), 5 MCP wrappers (roam_alerts, roam_timeline, roam_test_impact, roam_disambiguate, roam_why_fail), cross-language orphan-imports (JS/TS/Go), auto-generated complete-reference appendix in the docs site, MCP error-storm rate-limiter, agent-export --brief mode, observability hook for swallowed exceptions, and registry-dispatch detection in roam impact.

--json empty-state sweep

Same class of bug as the 12.18.1 safe-zones hotfix. Fixed three real bypasses uncovered by JSON-parse probes: cmd_complexity (3 sites: empty data, no matches, no bumpy roads), cmd_coverage_gaps (missing-filter usage error), and cmd_config where a flag-default mismatch made roam --json config silently produce empty output.

silent except: pass observability hook

84 except Exception: pass blocks across 40 files masked real failures (missing schema columns, optional dependencies, sqlite errors). Added roam.observability.log_swallowed which is a no-op unless ROAM_VERBOSE=1 (or ROAM_OBSERVABILITY=1) is set. Applied to the heaviest offenders: cmd_metrics (12 sites) and cmd_describe (8 sites). Rate-limited to 5 reports per scope per process.

five MCP wrappers

Wired up agent-actionable signals that were CLI-only: roam_alerts, roam_timeline, roam_test_impact, roam_disambiguate, roam_why_fail. All five added to the core preset (35 → 41 core tools).

N+1 SQL batching

Replaced per-symbol conn.execute loops in cmd_adversarial (orphaned-symbols + high-fan-out checks) with a single batched_in() query. On a 14k-symbol repo, roam adversarial previously made thousands of round-trips; now one batch per check. Same pattern for cmd_affected (start-symbol collection).

auto-regenerated command reference

Hand-curated workflow sections in docs/site/command-reference.html now have a complete auto-generated appendix listing every command + short help line organised by category, between <!-- BEGIN auto-reference --> markers. Regenerate with python dev/build_command_reference.py. Coverage went from 73 to 185 commands documented.

cross-language orphan-imports

was Python-only. Extended to JS/TS (path-rewrite resolution + bare-specifier detection) and Go (stdlib + hostname-shaped import path heuristic). New --lang flag (all / python / javascript / go).

roam audit

One-shot codebase audit meta-command. Chains health → debt → dead → test-pyramid → api → stats → hotspots --danger into one envelope with a top-level summary (verdict, health_score, debt_total, danger_zone_count, api_surface, etc.). Pass --brief to drop per-section detail.

AI-on-client-code default OFF

Sampling/LLM hook in mcp_extras/sampling.py now requires ROAM_AI_ENABLED=1 (or =true) to dispatch payloads to the client's LLM. Without the env var, the hook returns None and callers fall back to the raw envelope. GDPR / EU AI Act credibility blocker for the first paid audit.

roam impact dispatch-via-registry

Dogfood #189 — the call graph misses consumers that route through string-lookup tables (cli _COMMANDS, ask recipes, plugin entry points). New indirect_refs field in the impact envelope scans source files for string literals matching the symbol's name/qname. Surfaces 43 sites for health that the static graph misses.

agent-export --brief

roam --json agent-export previously emitted ~6 KB of nested JSON (directory layout, key files, hotspots, layers, clusters). New --brief flag drops the verbose payload and keeps only the top-level summary — 6197 → 608 bytes (10× reduction). Useful for CI / fleet workflows that just need project metadata.

[12.18.1] - 2026-05-05

Hotfix for a CI failure spotted in the 12.18 release run. roam safe-zones --json <missing-symbol> printed a plain-text "Target symbol(s) not found in the dependency graph." line when the target wasn't in the graph, which broke json.loads consumers. The empty-result branch now emits a proper envelope with summary.verdict, internal_size=0, and boundary_size=0.

The bug pre-dated this batch — it surfaced because CI runs in 3.12 / 3.13 environments where the test fixture happened to seed a name that wasn't in the test-project graph. Local Python 3.11 runs didn't trip it.

[12.18] - 2026-05-05

Ten more deep passes (rounds 81-90), shipped as a focused follow-up to 12.17. Net new surface: 5 CLI commands (disambiguate, pre-commit, mcp-status, test-impact, recipes), 1 new flag (map --seed/--depth), 1 new env-var override family (ROAM_RERANK_*), MCP error-storm rate-limiter that drops verbose envelope on repeated failures, and a recheck-driven shipping pipeline that caught residual stale counts left over from the 12.17 ship.

roam disambiguate <name>

Lists every symbol matching the name with file/line/kind/ signature/docstring snippet + PageRank tiebreaker. Saves agents from picking the wrong overload when names collide.

roam pre-commit

Generates a git pre-commit hook that runs git diff --cached | roam critique on staged changes. Idempotent installer (--install); preview-only by default (--print). ROAM_PRECOMMIT_SKIP=1 to bypass.

roam mcp-status

Companion to roam doctor for the MCP transport: preset, registered tool count, backpressure limits (max_concurrent, in_flight, busy_responses_total), result-cache size, watcher state.

roam test-impact <range>

Sharper than affected-tests. Walks BFS over the reverse call graph from each changed symbol; ranks tests by the number of changed symbols that reach them.

rerank weights via env vars

ROAM_RERANK_ALPHA / BETA / GAMMA / DELTA / EPSILON / ZETA override [retrieve] config without touching config.toml. Useful for quick weight-tuning loops.

roam fitness --explain

Confirmed already shipped. Verified the existing flag covers the per-violation rule citation requirement.

MCP error storm rate-limit

When the same error_code fires ≥ 3× in a row, the MCP error envelope drops the verbose fields (hint, suggested_action, doc_link, severity) and replaces them with a tight {error_code, repeat_count, trimmed: True} shape. Reduces token bloat in agent retry loops. Counter resets when a different error_code fires.

roam recipes

Sugar over roam ask --list for discoverability. Lists every recipe with intent + example queries + commands. JSON envelope includes the full recipe metadata.

roam why --json audit

Verified that the existing why --json payload already returns structured per-symbol fields (role, fan_in, fan_out, pagerank, reach, cluster). No work needed — the explanation is already structured.

roam map --seed --depth

Restricts the project map's top-symbols list to symbols reachable from a seed file within N hops. For monorepo navigation where the full map is overwhelming.

[12.17] - 2026-05-05

Sixty deep passes (rounds 21-80), shipped together. Net new surface: 18 CLI commands (plugins, test-pyramid, index-stats, telemetry, orphan-imports, changelog, graph-export, help-search, timeline, pr-prep, stats, why-fail, graph-stats, recommend, api, exit-codes, version, plus the oracle batch subcommand), 1 MCP tool (roam_catalog), 2 doctor checks, 11 new ask recipes, many new flags (--explain, --danger, --env, --batch, --quality, --scope, --check, --quick, --hops, --mode, --since-tag, --focus, --inline, --by-file, --weights, --recent, --dry-run, --next), 2 opt-in indexing structured error doc_link + severity field, ask-classifier auto-routing for unknown commands, opt-in local telemetry, richer roam_catalog metadata (when_to_use + examples), graceful Ctrl-C handling, MCP roam_health payload trimming when noisy, graph-builder memoization, and a deprecation registry hook.

roam why-fail <test>

Triage helper: traces from a failing test (or symbol) back to recently-changed symbols it transitively reaches. Sorted by recency × hop distance × PageRank.

roam graph-stats

Graph-level invariants: density, weak components, non-trivial cycles, average degree, top-inbound symbols. Single overview number for "how dense / connected is this codebase".

roam recommend <symbol>

Surfaces related symbols using three signals — call-graph neighbours, git co-change, persisted clone siblings — combined with normalised contribution scoring.

roam diff --since-tag

Auto-fills the commit range with <last-tag>..HEAD via git describe --tags --abbrev=0.

roam tour --focus <module>

Constrains the tour (top symbols, reading order, entry points) to files under the given path prefix.

taint risk score

roam taint summary now includes a 0-100 risk_score weighting errors 5×, warnings 1×, and discounting sanitized findings.

roam context --inline

Concatenates the recommended files into one paste-ready block with line numbers — for chat agents that prefer one big string over multi-file output.

roam clones --by-file

Aggregates clone pairs into (file, file) coupling. Shows which file pairs are most clone-coupled.

graph-builder memoization

build_symbol_graph and build_file_graph cache by id(conn) so compound commands like pr-prep (which internally call multiple subcommands) don't rebuild the graph multiple times.

roam api

Lists the public API surface (exported public symbols + their signatures). Useful for changelog generation and breaking- change detection.

error envelope severity

MCP error envelopes now include a severity field (info | warning | error | fatal) per error code. Lets agents branch on severity without parsing the message.

roam search --recent

Boost results in files modified within N days. Useful when retracing very recent changes.

roam config --weights

Surfaces the active rerank weights (alpha/beta/gamma/delta/ epsilon/zeta) merged with defaults. Replaces grepping the source.

roam diagnose --batch

Run diagnose on N symbols from a newline-separated list (file or stdin). Mirrors the oracle batch pattern.

MCP roam_health payload trimming

When the issue count is ≥ 50, the MCP envelope drops the verbose issue list and keeps the score, category counts, and breakdown. Set ROAM_MCP_HEALTH_FULL=1 for the unfiltered shape.

roam reset --dry-run

Preview the destructive reset (DB path + size) without deleting. No --force required for the preview.

roam exit-codes

Lists every roam exit code with its meaning. Replaces grepping the docs or source.

roam workflow --next

Given a previously-run command name, suggest what to run next (e.g. after preflight: context, impact, diff).

deprecation registry

Adds the _DEPRECATED_COMMANDS map in cli.py. When a deprecated command is invoked, the LazyGroup resolver prints a "use X instead" note on stderr without breaking the call.

roam version --check

Prints the installed version and (with --check) queries PyPI for the latest version. Offline-friendly: falls back silently when PyPI is unreachable.

roam timeline <symbol>

Chronological commit history for the file owning a symbol: SHA, date, author, lines added/removed, subject. Joins symbols × git_file_changes × git_commits with a GROUP BY commit_id to dedupe duplicate change rows.

roam pr-prep

One-shot pre-PR fitness check that bundles diff + critique + pr-risk into a single envelope with a top-level ready_to_open boolean. Replaces calling four commands sequentially before opening a PR.

roam eval-retrieve --quick

Runs the first 5 tasks of the bench harness for fast local iteration. The full 30-task bench takes too long for tight weight-tuning loops.

roam config --check

Validates .roam/config.json against the known-keys schema. Flags unknown keys (typo guard) and type mismatches. Lists the canonical key set with one-line descriptions when no issues are found.

richer roam_catalog metadata

Tool catalog now includes when_to_use (extracted from each docstring's "WHEN TO USE:" line) and up to three doctest-style >>> roam ... examples per tool. Lets agents pick the right tool without fetching each individual description.

roam impact --hops N

Bound the BFS at N hops instead of full transitive descendants. --hops 1 mirrors roam uses; --hops 3 shows callers of callers of callers. Lets agents scope a refactor to a controlled radius.

ROAM_QUERY_TIMEOUT_S query timeout

Opt-in SQLite progress handler that interrupts long queries past N seconds. Prevents hangs on huge codebases. Default behaviour unchanged when env var is absent.

roam search --mode regex|exact|substring

Three matching modes. Default is substring (LIKE %p%, the existing behaviour). regex registers a Python re-backed SQLite REGEXP function. exact matches name = pattern only.

roam stats

Aggregate metrics over the index: file count, symbol count, total lines, recent commit activity (last N days), broken down by language / file role / symbol kind. Useful as the first thing an agent runs after roam init.

roam test-pyramid

Counts test files by sub-kind (unit / integration / e2e / smoke / unknown) using classify_test_kind from . Verdict flags inverted pyramids (e2e+integration > unit) and unstructured test layouts (unknown >= 4× classified).

working-tree drift in index_status

Adds a dirty_files field to the staleness envelope. Even when HEAD matches the indexed commit, an outstanding working-tree edit makes the symbol/edge data stale; we count modified files via git status --porcelain and surface a refresh hint.

roam_catalog MCP tool

Machine-readable list of every registered MCP tool with capability flags (core / read_only / destructive). Replaces having to enumerate list_tools and parse each one — the catalog is one round-trip and is part of the core preset.

roam health --explain

The 0-100 health score is a weighted geometric mean of five factors; --explain shows each factor's "loss" in points so the user can see which dimension is dragging the score down. Surfaced in both text mode (sorted breakdown table) and JSON envelope (score_breakdown array).

doctor adds plugin + table checks

roam doctor now runs 13 checks (was 11). New entries: plugin discovery error count via get_plugin_errors(), and required-table presence (files, symbols, edges, git_commits, file_stats) — surfaces a half-migrated DB before a downstream "no such table" error.

roam config --env

Walks src/roam/ for ROAM_* references and prints a sorted, deduped inventory of every env var the codebase reads, with the file/line of the first read and whether it's currently set. Replaces grepping the source manually.

roam hotspots --danger

Files in the top quartile of churn × file complexity × max fan-in. Score is the geometric mean of the metric ratios so a moderate-everywhere file ranks above one that's extreme in only one dimension.

roam index-stats

Surface the .roam/index.db size, table row counts, and SQLite fragmentation (freelist_count / page_count). Verdict suggests VACUUM above 25% fragmentation and roam reset when both fragmented and oversized (default 200 MB threshold, override via ROAM_INDEX_SIZE_WARN_MB).

roam critique --batch <dir>

Reviews every .diff and .patch in the directory in a single pass. Handy for reviewing a stack of PRs or a series of git format-patch output. Per-diff verdict + aggregate gate fail when any diff has a high-severity finding.

graceful Ctrl-C

python -m roam now catches KeyboardInterrupt at the top level and exits with the conventional 130 instead of dumping a traceback. The indexer also catches the interrupt to release its lock cleanly, so a rerun resumes from the last committed checkpoint instead of stumbling on a stale .roam/index.lock.

auto-route unknown commands

When roam <unknown> doesn't have a close edit-distance neighbour in _COMMANDS, the LazyGroup's resolver now consults the ask TF-IDF classifier. If a recipe matches with confidence ≥ 0.5, the UsageError suggests roam ask "<input>" so a natural-language attempt ("trace login flow through middleware") still leads somewhere useful in one turn.

opt-in local telemetry

ROAM_TELEMETRY_LOCAL=1 enables a tiny SQLite ring buffer (.roam/telemetry.db, 500-row cap, prune-on-write) that records (command, duration_ms, exit_code, ts) for every CLI invocation. Surface via roam telemetry (slowest + recent calls). Strictly local — no network. No-op when env var is absent so the hot path stays unaffected.

roam oracle batch

The five boolean oracles (symbol-exists, route-exists, is-test-only, is-reachable-from-entry, is-clone-of) now accept a JSONL stream via roam oracle batch [--input -]. Each line is one {oracle, args} object; output is a single JSON envelope with all results. Useful for fleet-style pre-flight checks (50 symbols at once instead of 50 round-trips).

roam orphan-imports

Quick Python-only lint that flags imports the indexer couldn't resolve. Distinguishes internal_typo (top-level package indexed but submodule missing — e.g. roam.cmds.foo instead of roam.commands.cmd_foo) from missing_package (genuinely absent). JS/TS/Go versions deferred — per-language scaffolding overhead is too much for one pass.

roam docs-coverage --quality

Buckets every public symbol's docstring into ABSENT / SHALLOW / RICH. Heuristic: a docstring is RICH when its length ≥ 80 chars AND it mentions params/returns or has an example block; SHALLOW otherwise. Surfaces in both text and JSON output, with sample symbols per bucket so the user can see the gap concretely.

roam search --explain shows PageRank

The --explain flag already showed BM25 + matched fields + highlights + term counts. adds the per-result PageRank to the explanation so users can see when ordering is structural-rerank- driven vs. lexical.

roam retrieve --scope <dir>

Restrict candidates to files under a given path prefix — useful for monorepos and large codebases where the user knows the relevant subtree. Post-filter on the ranked candidate list, so no rerun of the heavy retrieval pipeline.

roam changelog --suggest

Read commits since the last tag, classify them via Conventional Commits prefixes (feat / fix / perf / refactor / docs / test / chore / build / ci), emit a draft ## [Unreleased] markdown section grouped by bucket. --since <ref> overrides the tag autodetect.

roam graph-export

Write the symbol or file dependency graph as GraphML / DOT / JSONL for plugging into external graph tooling (Gephi, Cytoscape, igraph, or custom analyses). --scope file switches from the symbol-level graph to the file-level graph.

roam help-search <query>

Fuzzy match across every command's name + short docstring. Replaces grepping --help-all output of 158 commands. Score weights name matches above docstring matches and rewards shorter matching names.

MCP-level result caching

The MCP server already had per-cell caching for a handful of hot paths (understand, tour); promotes ~30 read-only commands into a shared, index-mtime-keyed result cache. Cache hit drops the round-trip from 153ms to 1ms (153× speedup) without changing tool semantics. Auto-invalidates on reindex (mtime bump on .roam/index.db).

roam ask recipe expansion (13 → 24)

Eleven new TF-IDF-classifiable recipes covering common agent workflows: trace-bug, who-owns, what-changed, audit-security, explore-impact, find-similar, why-this-exists, check-pr, explore-tests, dependency-update, visualize-architecture. Each maps to an existing roam command pipeline so the dispatcher stays a thin classifier-and-route — no new analysis logic.

test sub-classification

file_roles.py now exports classify_test_kind(path) returning unit | integration | e2e | smoke | unknown. Path-pattern first (e2e/, integration/, cypress/, playwright/), then filename-pattern fallback (_e2e.py, _smoke.py). Lays the groundwork for "test pyramid" reports without changing the existing is_test boolean contract.

error envelope doc_link field

The MCP error path already emitted error_code, hint, and retryable. fills the fourth field of the structured- error contract: every classified error_code now carries a stable doc_link pointing at an anchor in the public troubleshooting page. Agents get one URL to fetch when self- serving an error, instead of grep-the-docs-and-pray.

opt-in parallel source prefetch

ROAM_PARALLEL_INDEX=1 enables a thread-pool source prefetcher in the indexer. Disk reads run in parallel up to min(32, cpu_count*2) workers ahead of the (still-serial) parse + DB write loop. The serial section is unchanged, so this is safe under concurrency and a no-op without the env var.

I/O-dominated indexes (cold cache, OneDrive-mirrored repos, network drives) see the biggest wins; CPU-bound indexes see no regression because the cache is consumed in-order.

roam plugins

The plugin discovery system has shipped since v11 (entry points + ROAM_PLUGIN_MODULES) but had no introspection surface. roam plugins lists discovered commands, detectors, language extractors, extensions, grammar aliases, and any discovery errors. JSON envelope mirrors the same fields. With no plugins registered, prints the activation hint instead.

Decisions logged (no shipped change)

[12.14] - 2026-05-05

Ten more research passes building on v12.13's speed wins. Three land as concrete features; the rest were research-decided (existing surface adequate or out of scope).

Did-you-mean for command typos

LazyGroup.resolve_command now catches Click's "No such command" and surfaces the closest names by edit distance. Previous behaviour:

` $ roam contxt Usage: python -m roam [OPTIONS] COMMAND [ARGS]... `

— bare error, no recovery hint. Now:

` $ roam contxt Error: No such command: 'contxt'. Did you mean `roam context`, `roam agent-context`? `

Up to 3 suggestions at edit-distance ≤ 0.6, picked from the live _COMMANDS table so plugin commands also surface.

Auto-refine on low-confidence retrieve

When roam retrieve confidence drops below 0.40, the verdict now appends a REFINE: block with 2-3 alternative queries:

1. Drop NL filler"trace the login flow""login flow", removing the words that diluted the lexical signal. 2. Anchor on top result's file — adds --seed-files <path> pointing at the highest-scoring candidate. 3. Pivot to roam search — when the query contains an identifier-shaped token, exact-name lookup may beat structural retrieval.

Surfaced in both text mode (REFINE: block) and JSON (summary.refinements), so MCP clients can branch on it.

--help-all global option

roam --help shows priority categories + 66 names from "More Commands" without descriptions. Agents mapping the surface want every command's one-liner. roam --help-all renders all 162 invokable names with their AST-extracted short-help, sub-second. The flat list is alphabetical, deterministic, and pipeable.

Smaller fixes

Research findings (decided not to ship)

[12.13] - 2026-05-05

Ten dedicated research passes plus three check phases. Drops the third version segment going forward — there's no reason for a patch suffix on these incremental releases. Future versions: 12.14, 12.15, not 12.13.x.

Speed wins

| Operation | v12.12.9 | v12.13 | Speedup | |---|---|---|---| | roam --help | 3845 ms | 790 ms | 4.9× | | roam uses | 700 ms | 347 ms | 2.0× |

--help cold path. The previous format_help() called self.get_command() on every command in the priority categories, which triggered importlib.import_module() for each cmd_*.py. Around 20 module imports added 3.5 seconds to render the help banner. v12.13 extracts the short-help via Python ast from the source file's first docstring without importing — same output, no cmd module loads.

roam uses warm path. _test_text_consumers was reading ~590 test files (4.0 seconds of io.open calls) on every uses invocation against a Python repo. The fallback exists for JS/Vitest where the symbol resolver leaves gaps; on Python / Go / Rust the edges table already has every reference, so the scan was a 4-second-per-call no-op. Now gated on whether the target's language is in the JS family (javascript, typescript, tsx, jsx, vue, svelte).

Smarter retrieval

Newcomer-friendly tour

roam tour "Key Symbols" list now appends a one-line docstring summary for each top symbol. Pure-PageRank ranking surfaces plumbing functions (open_db, json_envelope, find_project_root) at the top because every command imports them — without context, a newcomer doesn't know what these are. The docstring excerpt orients them:

` fn open_db src/roam/db/connection.py:354 Context manager for database access. Creates schema if needed fn json_envelope src/roam/output/formatter.py:346 Wrap command output in a self-describing envelope. `

Bench-neutral, performance-positive

The 10-pass round preserves the bench position from v12.12.9: recall@5=0.708, recall@10=0.778, recall@20=0.878 across the 30-task self-bench. Speed gains are pure addition.

Research findings (not landed)

Some passes researched-and-decided rather than shipped:

[12.12.9] - 2026-05-05

Three smarter / more dynamic moves layered on the v12.12.8 polish: recency-aware retrieve, calibrated confidence numbers, and a broken empty-state guard.

Recency-aware retrieve (adapts daily without retuning)

Files modified within the last 14 days now get a small boost in the roam retrieve reranker. Hypothesis: when a developer asks "where is X?" they're usually asking about something they're actively working on. Magnitude up to +0.05 for files edited today, decaying linearly to zero at 14 days. Suppressed when the query is shaped like a historical question ("old auth handler", "deprecated routes", "legacy code") because recent edits are anti-signal there.

Implementation: _recency_boost in retrieve/rerank.py. One batched MAX(git_commits.timestamp) query per call — no per-candidate fan-out. Bench-tuned at 0.05 to be recall@5 +0.8 pp neutral-to-positive against the 30-task self-bench (the synthetic bench labels treat all expected files as equal regardless of mtime, so a stronger recency lift slightly rearranges co-equal answers and shows as bench-neutral; the magnitude is real-world- positive without disturbing bench-equal-treatment).

The boost adapts daily — yesterday's hot file becomes today's stale one without any retuning or feedback loop.

Calibrated confidence numbers in retrieve

The previous binary low/ok confidence label is now a continuous score in [0.0, 1.0] exposed in the verdict and JSON summary. Three signals combine: score gap (top vs runners-up, gap ≥ 0.30 → unique winner), score floor (top < 0.30 with bunched tail → noise), and squared token coverage. The squared coverage penalises partial-coverage queries harder than linear — "trace the login flow" (2/3 tokens covered, "login" missing) had been crossing "ok" because linear coverage gave 0.67; squared drops to 0.45 and the verdict carries the lower number.

Output sample:

` VERDICT: 5 spans (... 10 seeds) (confidence 0.82) ← real impl query VERDICT: 5 spans (... 10 seeds) (confidence 0.71) ← junk query `

JSON summary now exposes confidence: 0.82 alongside the existing low_confidence boolean.

roam coverage-gaps empty state

The "no flag passed" case used to print "Provide --gate <names> or --gate-pattern <regex>" and exit. Now leads with VERDICT: missing required filter — pass --gate or --gate-pattern, lists the two flags with their formats, and shows two example invocations. Same shape every other empty-state command in the surface uses.

[12.12.8] - 2026-05-04

Phases 2 + 3 + 4 in one release: rough-edge polish, smarter verdicts, and cross-command synergy.

Phase 2 — verdict-first compliance

Several commands skipped the surface-wide VERDICT: ... opening line that every other command leads with, leaving agents to count [FAIL] markers or scroll past raw section headers to find the bottom line. Now consistent across:

Phase 3 — smarter verdicts that name the driver

Plain count summaries don't tell a user what to fix first. Three high-traffic verdicts now name the dominant signal so the next action is one read away:

Phase 4 — every command points at its natural follow-up

The next_steps.suggest_next_steps registry covered health, context, hotspots, diagnose, and dead. Five more commands now generate follow-up commands at the bottom of every text run, so an agent finishing one roam call sees the next roam call to make:

Each suggestion is scoped to the bare symbol name (the (file:line) suffix the resolver appends to label is now stripped before the template fills) so the follow-up command is copy-pasteable.

[12.12.7] - 2026-05-04

Phase-1.5: speed up agent search vs grep.

Findings (measured on this 15K-symbol repo)

| Tool | Latency | Result quality | |---|---|---| | grep (POSIX) multi-shape | 200–2000 ms | raw text, false positives in comments / strings | | roam search subprocess | 350 ms | symbols by name + PageRank rank | | roam uses subprocess (warm) | 700 ms | direct dependents grouped by edge type, no false positives | | roam_uses MCP tool (warm) | <100 ms | same as CLI but in-process |

ripgrep (Claude Code's Grep tool) is ~50–200 ms — 2–5× faster than roam's CLI in raw wall-time. The win for roam refs / roam_uses isn't speed — it's that the result is already correct. Multi-shape grep needs follow-up filtering to drop comment / string-literal false positives; the agent then has to read each match to learn the structure. Going through the indexed call/import/inherit graph returns one structured envelope with kind / file / line per consumer.

Changes

The recommendation is documentation-led, not a speed optimisation — roam's CLI startup overhead (~250 ms python-process spawn) is the floor, and shrinking it past the MCP-warm path isn't justified relative to the 5–10 grep cycles a single roam refs call replaces. For agents in MCP-enabled clients (Claude Code, Cursor, Codex CLI) the latency gap closes entirely; the recommendation tells agents in any client to prefer roam refs for reference-finding because the fewer iterations dominate the latency comparison.

[12.12.6] - 2026-05-04

Phase-1 deep-dogfood release. Live-fired roam against this very repo to find edge cases that didn't show up in unit tests. Five real correctness wins, all bench-positive.

roam retrieve ranks implementations above tests

For implementation-style queries ("where is X", "find X", "how does X work") the reranker now applies a -0.18 penalty to test files. The test files weren't wrong — they had legitimate fan-in / PageRank from every test importing the conftest fixtures — but for "where is X" the user wants implementation, not the test. On the dogfood query "where is the patch verifier with clones-not-edited check":

``` Before: #1 test_verify_patch_match (test)

#2 critique_patch (MCP wrapper)

#3 TestCheckClonesNotEdited (test class)

#4 check_clones_not_edited ← actual answer at #4

#5 _patch_stub_backend (test)

After: #1 critique_patch

#2 check_clones_not_edited ← lifted to #2

#3-4 tests demoted ```

Bench (recall@K on roam_self.jsonl):

| Metric | v12.11 baseline | v12.12.6 | Δ | |---|---|---|---| | recall@5 | 0.664 | 0.700 | +3.6 pp | | recall@10 | 0.758 | 0.786 | +2.8 pp | | recall@20 | 0.900 | 0.878 | -2.2 pp |

The penalty was empirically tuned at -0.18 — stronger penalties (-0.25) gave bigger top-5 gains but regressed recall@20 more. The bench expects test files as co-answers for some "where is X" queries (e.g. test_personalized_pagerank.py is listed alongside pagerank.py); -0.18 keeps those in top-20 while still pushing high-PR test fixtures below same-token implementations at top-5/10.

Implementation queries down-weight structural, up-weight lexical

Same query family had a deeper issue. "where is the symbol resolver" ranked _resolve_file (PR=0.99, fts=0.65) at #1 above the actual find_symbol (PR=0.16, fts=0.88) — PR was dominating because the 6× PR ratio overwhelmed the 1.35× fts ratio. For "where is X" queries v12.12.6 now down-weights alpha (PR) by 30% and up-weights lexical_baseline by 20% within a single call. Navigation / planning queries still use the structural-strong default.

Tokenizer learns programming-domain shorthand

Two seed-token gaps caused several "where is X" queries to return generic noise:

roam ask extracts identifiers with leading underscore

The recipe-runner regex used \b[a-z][a-z0-9]+(?:_[a-z0-9]+)+ which fails the word-boundary check before a leading _. "is it safe to delete _resolve_file" extracted no symbol, then passed the full query string to roam uses as the symbol name — which produced "symbol not found: 'is it safe to delete _resolve_file'". Regex now allows the leading underscore. The full safe-delete recipe runs end-to-end on the dogfood query.

roam fitness prepends a verdict line

Every other command in the surface starts its text output with VERDICT: …; fitness skipped straight to the rule list and left the user counting [FAIL] markers to know if the gate passed. Now opens with e.g. VERDICT: 2 of 3 fitness rule(s) fail (51 violation(s)) so the bottom line is on the first line.

[12.12.5] - 2026-05-04

A correctness sweep on the agent-orientation commands and a clarity sweep on the doctor output. The big finding: the prior tour and understand commands were ranking pytest fixtures at the top of "key abstractions" / "key symbols", and a newcomer's reading order started inside tests/conftest.py. The graph said it was correct (those fixtures genuinely have huge fan-in) but it's exactly the wrong shape for orientation.

roam tour and roam understand orient in source code

Both commands now exclude symbols whose file is classified as test from the Key Symbols / Key Abstractions list, and the reading-order / entry-points lists drop tests, dev scripts, generated code, configs, examples, benchmarks, build output, CI, and docs — keeping only source files (and a small extension of "where else might a real entry point live"). On roam itself the change pulls cli_runner / indexed_project / project_factory / conftest.py out of the top-10 list and surfaces cli / open_db / json_envelope / ensure_index / LanguageExtractor instead.

Generic-named property false positives

Tour also drops kind=property/field/attribute symbols whose name is in a small list of generic names (path, name, value, key, id, …). These are name-collision artifacts in the symbol resolver: every obj.path reference in unrelated code resolves to the first class with a path property, inflating that one symbol's in-degree to hundreds. WebhookBridge.path was the live example — fan-in 490 against a 3-line property because the resolver couldn't tell which .path reference belonged to which class.

File-role classifier learns benchmark-shaped directories

benchmarks/, benchmark/, bench/, and bench-repos/ now classify as examples (was source). Without this fix the new tour / understand filters wouldn't help on roam itself — benchmarks/agent-eval/prompts.py would still surface as the "start file". Four new tests in test_file_roles.py cover the new patterns.

roam doctor clarity

Tests

[12.12.4] - 2026-05-04

Fix the MCP server card's tool count and add a guard test so it can't drift again.

The card reports the server's capabilities to MCP-discovery surfaces (PulseMCP, mcp.so, Smithery, …). Its capabilities.tools.total field had been at 120 (with presets.core: 33) for several releases while the live MCP server registered 122 tools and the core preset had grown to 35. The card description correctly quoted "122 MCP tools" in plain text but the structured number clients actually parse was stale.

Updated both copies (src/roam/mcp-server-card.json and the canonical templates/distribution/landing-page/.well-known/mcp-server-card.json). Added test_card_tool_count_matches_live_count which compares the card's tool count against surface_counts.collect_surface_counts so a future MCP-tool addition that forgets the card update fails CI rather than ships silently.

[12.12.3] - 2026-05-04

Documentation and MCP-surface polish caught while auditing the v12.12.2 release.

MCP wrappers in step with the CLI

Doc-count drift

Stale CLI / MCP counts swept across user-facing surfaces:

One stale xfail removed

test_json_contracts.py had dead in the FRAGILE_COMMANDS set (envelope was missing verdict on the minimal fixture). v12.x added the verdict field; --runxfail confirms all four parametrized tests pass on the minimal fixture. Removed from the fragile set; the suite now reports 24 xfailed (was 28).

[12.12.2] - 2026-05-04

Polish on top of v12.12.1's packaging hotfix. Two more files were broken post-install plus a quietly-skipping consistency test.

Bundled in the wheel

Consistency

cmd_ask uses the shared confidence helper

cmd_ask invented the low-confidence verdict pattern that v12.12 extracted into roam.output.confidence, but the original code declared its own _CONFIDENCE_THRESHOLD = 0.15 and inlined the score comparison. Refactored to import DEFAULT_CONFIDENCE_THRESHOLD and is_low_confidence from the shared module so threshold tweaks land in one place across cmd_ask, cmd_retrieve, and any future ranked-output command.

[12.12.1] - 2026-05-04

Hotfix: bundle YAML data files in the wheel.

PyPI installs of roam-code from at least v8.x through v12.12 silently shipped zero taint rules — the wheel didn't include roam/security/taint_rules/*.yaml because no package-data entry declared them. roam taint post-install on a clean venv reported "No rules in /.../security/taint_rules" with no actionable hint. Editable installs (pip install -e .) and source checkouts worked because the YAMLs were on disk; the bug only bit binary-wheel users.

pyproject.toml now declares [tool.setuptools.package-data] for:

Verified via clean-venv install of the rebuilt wheel: load_rules returns 14 rules including python-deserialization.

[12.12] - 2026-05-04

A focused close-out of the v12.3 dogfood report's five remaining open items. No new commands; this release tightens precision in the retrieve expansion path, restores signal in diagnose's churn column, makes critique's bench-hint discoverable from JSON / MCP clients, ships the missing taint pack, and centralises the low-confidence verdict pattern so it grows uniformly across commands.

Bug fixes — close v12.3 dogfood backlog

New surfaces

bench_hints:

Overrides are searched before the built-in rules so projects can shadow defaults. Closes the second half of dogfood #15 ("generalises to other projects via a .roam-critique.yml").

Tests

Pre-existing failures cleared during the close-out

The dogfood-cleanup test sweep surfaced four unrelated failures that had been silently red on main. Each turned out to be a small, localized issue rather than a deep regression, so they're fixed alongside the v12.12 close-out:

[12.11] - 2026-05-04

A precision and agent-UX release built on six rounds of dogfood feedback. Headline work: round-trip false-positive suppression across the entire analyzer surface, a cross-tool framework-alias filter that single-handedly fixes five inflated-PageRank reports, MCP capacity backpressure that replaces silent connection drops with structured RATE_LIMITED responses, and a tri-state oracle envelope so agents can distinguish "we proved no" from "we can't tell."

New modules

Precision (false-positive suppression)

Agent UX

Configuration

New flags / commands

Doctor

Security rules

Surface

[12.10.1] - 2026-05-04

A patch release for the 12.10.0 workflow-synergy release.

Fixed

[12.10.0] - 2026-05-03

A workflow-synergy and maintainability release. Headline work: semantic retrieval is now truthfully diagnosable, recipe workflows advertise gates and follow-up commands, existing fitness debt can be baselined without hiding new regressions, and several high-complexity indexing/analysis hotspots were split into focused helpers.

Added — workflow intelligence

Added — semantic activation diagnostics

Added — local operability and fitness baselines

Fixed

Refactored

Surface

[12.9.0] - 2026-05-02

A precision and graph-completeness release. Headline work: a registry-dispatch resolver that closes a long-standing gap in roam impact, three new Flask detectors, four new taint rule packs, intraprocedural taint propagation, an ask recipe for pytest fixtures, and a +11pp jump in roam's own type coverage.

Added — graph completeness

Added — Flask framework detectors

Total Python idiom detector count: 22.

Added — CodeQL-style taint rule packs

Mirror the dataflow models CodeQL 2.24 (Jan 2026) added.

Added — taint engine

Added — agent ergonomics

Fixed — detector precision

Fixed — type-coverage detector bug

Refactored

Improved

Surface

[12.8.0] - 2026-05-02

A documentation, positioning, and trust-scaffold release plus two new commands (pytest-fixtures and hover), tighter ORM detector precision, full SARIF coverage, and a documentation-drift CI check.

Added — roam hover

Added — SARIF for taint analysis

Added — pytest fixture dependency edges

Added — per-detector precision audit

Fixed — false-positive classes in ORM detectors

Fixed — minor

Cleaned — dead variables

Improved — preflight ergonomics

Added — documentation consistency CI check

Added — internal-link audit in doc-consistency check

Added — public docs pages

Added — SARIF output for Python detectors

Improved — README hero

Improved — agent ergonomics

Verification

[12.7.1] - 2026-05-02

Performance

[12.7.0] - 2026-05-02

A 10-round push past v12.6: 7 new idiom detectors (now 19 total), Pydantic/dataclass field display in roam context, framework-aware N+1 detection, async-call-graph propagation.

Added

Verification

[12.6.0] - 2026-05-02

A 10-round push past v12.5: roam py-modern for modern-Python adoption signal, roam py-types --ci CI gate mode, MCP wrappers, 12th idiom detector (lock-without-with), comprehensive end-to-end detector test fixture.

Added

Improved

Verification

[12.5.0] - 2026-05-02

A Python-pivot iteration release. v12.4 added the substrate (is_async + decorators on symbols, generated-dir exclusion, 4 idiom detectors). v12.5 doubles the idiom catalog, adds a new roam py-types command, and ships agent-facing badges for Pydantic / dataclass / pytest fixture / parametrize.

Added

Improved

Fixed (real bugs surfaced by the new detectors on roam-code itself)

Verification

[12.4.0] - 2026-05-02

A Python-pivot release. Three super-passes of dogfooding on real Python codebases (a Python research repo, an agent-eval workspace, a 17k-file external Python repo) surfaced gaps that the language-agnostic surface couldn't catch. v12.4 adds Python-specific structural signals and idiom detection without adding new commands — existing commands give better Python answers.

Added

Improved

Verification

[12.3.1] - 2026-05-02

A polish patch from additional dogfooding rounds. No surface changes, five papercut bugs fixed.

Fixed

Verification

[12.3.0] - 2026-05-01

A retrieve-quality + dogfood-correctness release. Recall@20 on the 30-task self-bench moved from 0.486 → 0.900 across the day's iterations (+41.4 pp). Sixteen agent-facing bugs surfaced by deep dogfooding were fixed across two sprints. No new commands; existing commands give better answers.

Fixed — agent-facing correctness

Improved — retrieve quality

Eight changes against the 30-task self-bench. recall@5 0.289 → 0.672 (+38.3 pp), recall@10 0.358 → 0.786 (+42.8 pp), recall@20 0.486 → 0.900 (+41.4 pp). Cross-repo regression test on a synthetic Python microservice returns 1.0 / 1.0 / 1.0 — the lift isn't an artifact of self-bench.

Added — small additions

Verification

[12.1.0] - 2026-05-01

Added

Fixed

Changed

[12.0.0] - 2026-05-01

Added

Changed

[11.1.3] - 2026-02-27

Fixed

[11.1.2] - 2026-02-27

Added

Fixed

[11.1.1] - 2026-02-27

Fixed

Removed

Added

[11.1.0] - 2026-02-25

Added

[11.0.0] - 2026-02-25

Added

Changed

Fixed

[10.0.1] - 2026-02-21

Added

Fixed

[10.0.0] - 2026-02-20

Added

Fixed

Changed

[9.1.0] - 2026-02-18

Added

Fixed

[9.0.0] - 2026-02-18

Added

[8.2.0] - 2026-02-14

Added

Fixed

Changed

Removed

[8.1.1] - 2026-02-14

Added

[8.1.0] - 2026-02-14

Added

Fixed

[8.0.1] - 2026-02-14

Changed

[8.0.0] - 2026-02-14

Added

Changed

Fixed

[7.5.0] - 2026-02-13

Changed

[7.4.0] - 2026-02-12

Added

[7.2.0] - 2026-02-12

Added

[7.1.0] - 2026-02-12

Added

Fixed

[7.0.0] - 2026-02-12

Added

Fixed

[6.0.0] - 2026-02-12

Added

[5.0.0] - 2026-02-10

Added

[4.0.0] - 2026-02-10

Added

Changed

[3.7.0] - 2026-02-10

Added

[3.0.0] - 2026-02-09

Added

Fixed

[1.0.0] - 2026-02-09

Added

[Unreleased]: https://github.com/Cranot/roam-code/compare/v11.1.3...HEAD [11.1.3]: https://github.com/Cranot/roam-code/compare/v11.1.2...v11.1.3 [11.1.2]: https://github.com/Cranot/roam-code/compare/v11.1.1...v11.1.2 [11.1.1]: https://github.com/Cranot/roam-code/compare/v11.1.0...v11.1.1 [11.1.0]: https://github.com/Cranot/roam-code/compare/v11.0.0...v11.1.0 [11.0.0]: https://github.com/Cranot/roam-code/compare/v10.0.1...v11.0.0 [10.0.1]: https://github.com/Cranot/roam-code/compare/v10.0.0...v10.0.1 [10.0.0]: https://github.com/Cranot/roam-code/compare/v9.1.0...v10.0.0 [9.1.0]: https://github.com/Cranot/roam-code/compare/v9.0.0...v9.1.0 [9.0.0]: https://github.com/Cranot/roam-code/compare/v8.2.0...v9.0.0 [8.2.0]: https://github.com/Cranot/roam-code/compare/v8.1.1...v8.2.0 [8.1.1]: https://github.com/Cranot/roam-code/compare/v8.1.0...v8.1.1 [8.1.0]: https://github.com/Cranot/roam-code/compare/v8.0.1...v8.1.0 [8.0.1]: https://github.com/Cranot/roam-code/compare/v8.0.0...v8.0.1 [8.0.0]: https://github.com/Cranot/roam-code/compare/v7.5.0...v8.0.0 [7.5.0]: https://github.com/Cranot/roam-code/compare/v7.4.0...v7.5.0 [7.4.0]: https://github.com/Cranot/roam-code/compare/v7.2.0...v7.4.0 [7.2.0]: https://github.com/Cranot/roam-code/compare/v7.1.0...v7.2.0 [7.1.0]: https://github.com/Cranot/roam-code/compare/v7.0.0...v7.1.0 [7.0.0]: https://github.com/Cranot/roam-code/compare/v6.0.0...v7.0.0 [6.0.0]: https://github.com/Cranot/roam-code/compare/v5.0.0...v6.0.0 [5.0.0]: https://github.com/Cranot/roam-code/compare/v4.0.0...v5.0.0 [4.0.0]: https://github.com/Cranot/roam-code/compare/v3.7.0...v4.0.0 [3.7.0]: https://github.com/Cranot/roam-code/compare/v3.0.0...v3.7.0 [3.0.0]: https://github.com/Cranot/roam-code/compare/v1.0.0...v3.0.0 [1.0.0]: https://github.com/Cranot/roam-code/releases/tag/v1.0.0

Ready to try Roam? Install the free CLI · or see paid plans