System Design

Roam Architecture

Roam parses source once, stores structural facts in SQLite, and exposes deterministic query primitives to CLI and MCP clients — then emits structured evidence (receipts, ledger, packets) that any gateway can act on.

Inside vs gateway. Roam is the server in the MCP layering: it emits McpDecisionReceipt per sensitive tool call, the HMAC-chained run ledger, and the ChangeEvidence packet. Hosts (Claude Code, Cursor) own per-call approval; gateways own policy + correlation + audit aggregation; Roam ships the structured evidence stream those layers consume. See the MCP layering discussion for the full split.

Pipeline at a Glance

Repository ──> Index Pipeline ──> SQLite Storage
                                           │
                          ┌────────────────┼────────────────┐
                          ▼                ▼                ▼
                 Graph Analytics    Retrieval +      CLI / MCP
                 Rules Engine       Patch Verifier   Interfaces
                 Security           Code Graph       JSON / SARIF
                                    Attestation

The index is built once per repo with roam init. Subsequent runs are incremental — only changed files re-parse. All downstream consumers (CLI, MCP, CI gates) read from the same SQLite artefact at .roam/index.db.

Subsystem Responsibilities

SubsystemMain modulesResponsibility
Index Pipeline index/indexer.py, index/parser.py, index/symbols.py Build and refresh the structural index from source + git history.
Storage db/schema.py, db/connection.py SQLite schema, migrations, batched query helpers.
Graph Intelligence graph/builder.py, graph/layers.py, graph/clusters.py, graph/pagerank.py Centrality, layering, communities, cycle analysis, AST clone clustering.
Retrieval retrieve/pipeline.py, retrieve/rerank.py Graph-aware FTS5 + structural reranker (PageRank + co-change + clones + runtime hot).
Patch Verifier critique/checks.py, critique/aggregator.py Diff parsing + clones-not-edited + blast-radius + intent-alignment for roam critique.
Taint & Reachability security/taint_engine.py Graph-reach BFS over edges with sanitiser-stop nodes; OpenVEX-correct.
Code Graph Attestation attest/cga.py in-toto v1 statement builder. Merkle root over symbol fingerprints + edge bundle digest. Cosign-signable.
Fleet Planner fleet/manifest.py Multi-agent partitioner (Louvain + co-change + PageRank anchors); emits .roam-fleet.json.
Rule Engine rules/builtin.py, rules/engine.py Built-in rules + YAML rule packs (path / symbol / AST / dataflow patterns).
Interfaces commands/cmd_*.py, mcp_server.py, mcp_extras/ Deterministic queries for CLI and MCP clients. Sampling-driven compression, watcher-based invalidation, per-session memory.
Output Contracts output/formatter.py, output/sarif.py, output/schema_registry.py Stable text / JSON / SARIF envelopes for agents and CI; every --json error path returns a parseable envelope.

Index Pipeline Stages

  1. Discovery — collect tracked files (via git ls-files + .gitignore) and classify file roles.
  2. Parsing — tree-sitter parse per file with language routing across 28 supported languages.
  3. Extraction — symbols (classes, functions, methods, fields), signatures, docstrings, references.
  4. Resolution — convert references into graph edges (caller→callee, import chains, inheritance).
  5. Metrics — cognitive complexity, centrality (PageRank, betweenness), churn, co-change, cognitive load.
  6. Persistence — upsert into SQLite with incremental diffing; only changed files re-parse.
discover -> parse -> extract -> resolve -> metrics -> persist
                 (incremental path executes only changed files)

Command-to-Data Flow

Example: roam preflight AuthService

CLI cmd_preflight
  -> ensure_index()
  -> query symbols/edges/metrics
  -> run health/rule checks
  -> aggregate verdict + risk factors
  -> render text or JSON envelope

Example: roam cga emit --include-taint --sign

CLI cmd_cga
  -> ensure_index()
  -> attest.cga.build_statement()
       -> _symbol_fingerprints()       # Merkle root over (qname, kind, sig, path)
       -> _edge_bundle_digest()        # graph snapshot fingerprint
       -> security.taint_engine.run()  # graph-reach BFS, sanitizer stops
            -> _finding_to_vex_claim() # OpenVEX status + justification
  -> in-toto v1 Statement
       (predicateType: https://roam-code.com/spec/CodeGraph/v1)
  -> cosign sign-blob --bundle         # optional, graceful skip if absent
  -> .roam/attestations/<sha>.intoto.json + .sig

Property: the CGA chain is reproducible — same source tree + same git HEAD → same Merkle root → same predicate digest. Signing layers identity onto a deterministic fingerprint.

Tradeoff: static structure gives speed and determinism, but cannot model fully dynamic runtime behavior without trace ingestion (roam ingest-trace).

Agent OS Substrate

On top of the analysis core, Roam ships an 11-package control-plane substrate that lets agents earn the right to change code. Every package is repo-local (under .roam/), zero-network, and additive to the index.

PackageWhat it does
atomic_ioPOSIX + Windows-safe atomic writes (os.replace) for every ledger and bundle file.
agents_md/Compositional AGENTS.md generator; consumes the rest of the substrate.
constitution/Capstone .roam/constitution.yml unifying laws, rules, memory, gates.
db/findings.pyCross-detector finding registry (roam findings list/show/count); USER_VERSION 17.
laws/Invariant mining (roam laws mine/check) — self-installing.
leases/Multi-agent coordination (roam lease claim/release/list).
memory/Repo-local agent memory at .roam/memory.jsonl.
modes/Four cumulative action modes: read_only / safe_edit / migration / autonomous_pr.
policy/Graph-aware rule clauses (reachable_from, imports_from, ...).
runs/Per-run event ledger + HMAC tamper detection (roam runs verify).
world_model/Four detectors: side-effects, idempotency, causal-graph, tx-boundaries.

The canonical agent loop:

1.  roam runs start             # open run, get ROAM_RUN_ID (HMAC-signed events)
2.  roam mode safe_edit         # declare action surface
3.  roam pr-bundle init         # start proof bundle
4.  roam preflight <sym>        # gate before edit
5.  roam impact <sym>            # blast radius
6.  <edit>
7.  roam diff | roam critique   # review
7a. roam findings list          # cross-detector findings on the workspace
8.  roam pr-bundle emit         # close bundle with proofs
9.  roam runs end --with-pr-bundle-emit
10. roam replay <id>             # narrate the run
11. roam agent-score            # composite 0..100 score

Findings Registry

A normalised cross-detector table that answers "what's wrong with this workspace right now?" in one query — instead of running ten detectors and reconciling ten output shapes. Detectors keep their detector-specific tables and ALSO upsert a row to the central findings table, which becomes the surface for CLI consumers, SARIF emit, and suppression management.

Schema

ColumnPurpose
finding_id_str Stable string identifier (UNIQUE). Deterministic — rerunning a detector refreshes the same row in place. Convention: "<detector>:<subject>:<hash>".
subject_kind What kind of thing the finding is about: symbol, file, edge, commit, package, etc.
subject_id Foreign key into the table named by subject_kind. Nullable — not every subject maps to a row id.
claim Human-readable summary of the finding.
evidence_json Detector-specific structured fields. Schema is owned by the detector, not by the registry.
confidence One of heuristic, structural, static_analysis, runtime. See the tier table below.
source_detector Which detector emitted the row: clones, dead, complexity, etc.
source_version Detector version stamp. Consumers can spot rows produced under a stale detector shape.

Confidence tiers

Every finding carries a confidence label drawn from a closed enumeration of four tiers. Detectors pick the tier that matches their evidence — never mint new strings.

TierDefinitionExample
heuristic Name-pattern matching, length thresholds, fuzzy NLP signals. vibe-check's comment_anomalies — comments don't match code semantics.
structural Graph-pattern matching over the symbol / edge / call graph. n1's loop-with-dependent-write — a loop body issues a DB call that depends on the loop variable.
static_analysis Deterministic AST / CFG / dataflow analysis. complexity scores; missing-index's unconditional-predicate finding.
runtime Requires ingested runtime traces (OpenTelemetry / Jaeger / Zipkin / coverage). hotspots's UPGRADE / CONFIRMED / DOWNGRADE classification.

CLI surface

roam findings list                       # all findings on this workspace
roam findings list --detector clones     # filter by detector
roam findings list --subject-kind symbol # filter by subject kind
roam findings show <finding_id>          # one record, full evidence
roam findings count                      # per-detector totals

Full reference and flags: command reference for roam findings.

Detectors that persist findings

28 detectors persist findings to the registry (the registry stores last-run state per detector, so roam findings count returns 0 for detectors that haven't been run on the current corpus; counts are last-run state, not cumulative). Run roam findings count for the live per-detector tally on your workspace. The table below covers the original 16-detector substrate plus boundary and test-hermeticity; consumer / aggregator detectors (critique, doctor, fan, fingerprint, health, llm-smells, dark-matter) re-emit derived findings from these upstream detectors.

DetectorWaveTierWhat it finds
clonesW95structuralcopy-paste / structural duplicates
deadW99structuralunreachable symbols
complexityW102static_analysiscognitive complexity hotspots
smellsW109heuristicgod class / long method / feature envy (24 kinds)
n1W110structuralloop-with-dependent-query patterns
missing-indexW111static_analysisunindexed predicate columns
over-fetchW114static_analysisSELECT * / wildcard column reads
bus-factorW115heuristicsingle-owner critical components
auth-gapsW116structuralendpoints missing auth checks
vulnsW117static_analysisreachable vulnerable dependencies
invariants / lawsW119structuralmined invariant violations
hotspotsW120runtimeruntime-trace classified hotspots
taintW122static_analysissource → sink dataflow leaks
vibe-checkW125heuristicAI-rot anomalies (8 pattern families)
orphan-importsW132structuralimported-but-unused modules
conventionsW133heuristicnaming / layout convention drift
pr-riskW134structuralper-PR risk factors
duplicatesW136heuristicnear-duplicate symbol families
audit-trail-conformanceW145static_analysisaudit-trail integrity checks
audit-trail-verifyW146static_analysissigned-ledger verification
boundarystatic_analysis / structuralpublic-by-accident exports, wrong-direction layer imports
test-hermeticitystructural / static_analysisnon-hermetic test calls (network, time, random, fs, env, subprocess)

Agent loop integration

The registry slots into the canonical agent loop as step 7a — between roam critique (review the diff) and roam pr-bundle emit (close the proof bundle):

...
7.  roam diff | roam critique             # review the change
7a. roam findings list                    # cross-detector findings on the workspace
8.  roam pr-bundle emit                   # close bundle with proofs
...

This is the "what's wrong with this workspace right now?" gate. An agent runs it before closing a proof bundle so the bundle either references the surviving findings as accepted or carries evidence that the change made them go away.

Evidence Compiler

Roam is a local evidence compiler for AI-assisted software change. The findings registry above is one input layer; the evidence compiler aggregates findings, run events, policy decisions, tests, and approvals into typed ChangeEvidence packets. One shared record renders into PR Replay reports, SARIF, in-toto attestations, or OSCAL-shaped exports — no exporter owns its own data-gathering logic.

The eight evidence questions

Every sellable Roam report answers these eight questions. If a surface cannot answer one yet, the report says so explicitly via the producer_not_available redaction marker — never silent omission.

  1. Who acted? — human, agent id, MCP client id, tool id (runs, replay, MCP receipt).
  2. What authority existed? — mode, permits, leases, scopes, policy decision (mode, permit, lease, constitution).
  3. What context was read? — files, symbols, commands, handles, hashes (pr-bundle, context, retrieve).
  4. What changed? — diff hash, changed files, changed subjects (diff, graph-diff, pr-analyze).
  5. What could break? — blast radius, callers, tests, vulnerable paths (impact, preflight, test-impact, vuln-reach).
  6. What policy applied? — rules, laws, controls, exceptions (rules, laws, constitution).
  7. What verified it? — tests run / required, gates, attestations (tests, critique, pr-bundle, runs verify).
  8. Who accepted risk? — approval, accepted risk, reviewer, timestamp (permit, pr-bundle, run ledger).

Data model

TypePurpose
ChangeEvidence One evidence packet per code-change scope. Carries evidence_id, schema_version, repo_id, git_range, commit_sha, diff_hash, run_ids, mode, started_at, completed_at, verdict, risk_level, changed_subjects, findings, policy_decisions, tests, approvals, accepted_risks, artifacts, redactions, and a content_hash.
EvidenceSubject Portable identifier wrapper around things Roam already sees — symbol, file, endpoint, package, module, directory, commit, rule, control, run, bundle, finding, test, artifact. Survives reindexing so reports, SARIF rows, and attestations stay stable across rebuilds.
EvidenceLink Typed edges inside a packet (12-member closed enumeration): derived_from, touches, calls, tested_by, triggered, blocked_by, allowed_by, accepted_by, satisfies_control, maps_to_standard, supersedes, mitigates.
EvidenceArtifact File or data reference with a content hash and an optional path. Large artifacts are referenced by hash rather than embedded so the packet stays small and redaction metadata stays meaningful.

Projections

Every external format is a projection from the same evidence packet. No exporter owns its own data-gathering logic.

ProjectionUse
Markdown / PDF PR Replay, Due Diligence, AI Adoption Readiness, Incident Replay.
SARIF GitHub Code Scanning and CI annotations.
in-toto / CGA Signed proofs for code graph and change evidence.
OSCAL-like JSON / YAML Governance evidence and control mapping.
OpenTelemetry spans / events Agent observability bridges (LangSmith / Langfuse / Helicone).
CycloneDX VEX / OpenVEX Vulnerability reachability context.
CDEvents / CloudEvents Later CI/CD event interoperability.

Recipes

The compiler runs as a local recipe DAG, not a workflow engine. Each recipe declares its inputs, steps, required evidence, gates, report sections, and exports — then the runner executes existing Roam commands through Python APIs, collects their JSON envelopes, normalises them into a ChangeEvidence packet, and renders the configured reports and exports.

recipe:
  inputs:            # what the recipe needs (git range, scope, mode)
  steps:             # ordered list of Roam command invocations
  required_evidence: # which fields the packet must contain to close
  gates:             # pass/fail conditions
  report_sections:   # Markdown / structured-output sections
  exports:           # projection list (sarif, in-toto, oscal-like, ...)

Example recipes: pr-replay, governance-evidence-pack, codebase-due-diligence, ai-adoption-readiness, security-reachability-triage, post-incident-replay, migration-assurance.

Execution phases

The compiler ships in small phases that share the same evidence model.

  1. Phase 0 — Vocabulary freeze (in flight): enum-like constants for evidence subject kinds, link kinds, artifact kinds, claim severities, and redaction reasons; one test proving existing command outputs map onto these kinds.
  2. Phase 1 — Schema v0 (in flight): pure dataclasses for ChangeEvidence / EvidenceSubject / EvidenceLink / EvidenceArtifact; deterministic JSON serialisation; stable content hash; schema_version and redactions[]; no DB migration.
  3. Phase 2 — Envelope collector: helper that turns existing JSON envelopes into ChangeEvidence for the PR Replay path, with a warning list for fields that do not yet map cleanly.
  4. Phase 3 — PR Replay compiled report: one evidence JSON, one Markdown report, optional PDF projection, and a "Suggested Review configuration" generated from the same packet.
  5. Phase 4 — Governance control mapping: YAML control map from Roam evidence types to governance-control language, OSCAL-like JSON / YAML projection, and a sample report under templates/audit-report/.
  6. Phase 5 — Projection consolidation: central SARIF / in-toto / OTel / VEX projections driven by the shared evidence layer; per-command emitters become thin compatibility wrappers. Customer-pulled.

Confidence checks

SLSA Source Track L3

SLSA Source Track L3 is the strongest signed projection from the evidence compiler. Roam maps to SLSA SRC-L3 requirements by emitting two in-toto v1 statements alongside every code-change scope: a Code Graph Attestation (CGA) that pins the structural fingerprint of the analysed tree, and a SLSA Verification Summary Attestation (VSA) that projects the same ChangeEvidence packet into the SLSA-shaped predicate consumers expect.

Two predicates

Roam emits two attestation predicates from one shared evidence record:

Standalone CGA limitation

A standalone CGA alone does not reach SRC L3. A CGA is not a Verification Summary Attestation, so its supply-chain claim falls short of the SRC-L3 requirement set. For the canonical L3 path, pair the CGA with a sibling VSA via roam cga emit --also-vsa, or run the same wiring through roam pr-bundle emit --slsa-l3 — the VSA is byte-identical across both entry points because it projects the same shared evidence packet.

CI auto-trigger

roam ci-setup --with-slsa-l3 scaffolds a GitHub Actions workflow at .github/workflows/roam-slsa-src-l3.yml that auto-triggers roam pr-bundle emit --slsa-l3 --sign --keyless on every PR. The keyless cosign path uses Fulcio short-lived certificates plus Rekor transparency-log entries; the workflow emits the VSA, the run-ledger-root statement, the cosign signature triplet, and the Rekor record alongside the proof bundle.

Honest banner

Roam maps to SLSA SRC L3 requirements and supports evidence for the L3 claim. Roam itself does not certify that L3 is reached — the user is responsible for the offline verifier step plus Rekor publishing. The wording-lint that ships with the evidence compiler enforces this rule on every generated report: Roam emits the evidence; the verifier asserts the claim.

Why SQLite

See it run

The 5-minute canonical demo — install → health → preflight → critique → signed ChangeEvidence packet, end to end. The architecture above in one sitting.

Want this architecture run on your own repo by an analyst? See Roam Audit — we replay your last 5 PRs through the same pipeline, return a signed evidence packet plus a written report. Or jump straight to pricing / governance / trust.