Decision memory is not another name for documentation

Decision memory is the layer where architectural authority is stored when a decision is recorded, not inferred later when context is retrieved. It is not documentation, it is not a generic memory layer, and it is not retrieval. It is the substrate that holds which decisions are binding, what they apply to, and which ones have been superseded — as structured properties of the decision itself, set the moment the decision is made.

The clearest way to separate it from its neighbors is by what each layer is optimized to do:

  • RAG retrieves relevant information from a corpus.
  • Documentation records knowledge for humans to read.
  • Generic memory preserves context across turns and sessions.
  • Decision memory stores authority, precedence, scope, and supersession as properties of a decision object.

Those are four different jobs. The first three are all read-time operations: they surface or rank existing material when a query arrives. Decision memory is categorically different, because the thing it stores is fixed at write time and does not depend on any query.

Relevance is calculated at read time. Authority has to be recorded at write time. A retrieval system can rank documents by similarity. It cannot infer which architectural decision is binding, which rule applies to a particular surface, or which older decision has been superseded — unless that authority was explicitly stored when the decision was created. That is the role of decision memory.

This page is the canonical definition of that layer. It uses documentation as the first point of contrast, because documentation is the form most teams already have — then extends the same argument to generic memory and to RAG, which fail to govern AI coding agents for the same structural reason.

Why documentation alone is insufficient

When a team ships their first ADR process, the instinct is to put the ADRs in a wiki or a docs/adr/ folder and call it governance. In the human-only workflow, this works well enough — engineers read the ADRs, internalize the decisions, and apply them in code review.

In the AI-assisted workflow, this breaks down immediately. AI coding agents don't read ADRs. They generate code from prompts. If you want an agent to respect a decision, the decision needs to be in the prompt — injected as structured context before generation, not sitting in a wiki hoping the agent will find it.

But even prompt injection isn't governance. It's suggestion. The distinction between documentation, suggestion, and enforcement is the architecture of the governance problem.

The three-tier model

There are three tiers of architectural knowledge in an AI coding workflow, each with a different relationship to enforcement:

  1. Documentation — prose that explains context, rationale, and history. Wikis, ADR bodies, design docs. Read by humans. Retrieved (imprecisely) by RAG. No enforcement capability.
  2. Prompt memory — constraints injected into the model context. CLAUDE.md files, rules files, retrieved RAG passages. The model may or may not follow them. No structural enforcement.
  3. Decision memory — structured, schema-validated constraint records evaluated deterministically against model output. Authoritative, precedence-aware, scope-scoped. Enforcement-capable.

Most teams operating at Tier 1 believe they're at Tier 3. The most common mistake in AI coding governance is mistaking documentation retrieval for enforcement.

DOCUMENTATION PATH ADR wiki / docs/adr/ folder free-form prose RAG / embedding retrieval semantic similarity → k passages Prompt injection (suggestion) model sees context, may follow No enforcement model ignores or may follow DECISION MEMORY PATH project_memory.json typed records, scope, precedence Deterministic keyword scorer field weights + tag boost → top-K=3 Authoritative context injection model sees structured constraints Evaluator: PASS / WARN / FAIL enforced, auditable, blockable
Fig. 1 — Documentation path ends in suggestion. Decision memory path ends in enforcement. Same ADR source, different architectural handling.

What documentation is and what it isn't

Documentation is the written record of why decisions were made. ADR bodies contain context (why did we face this decision?), rationale (why did we choose this option?), and consequences (what are we accepting by making this choice?). This is valuable. It's the institutional knowledge that prevents future engineers from repeating past mistakes.

What documentation is not: a machine-evaluable constraint. A prose paragraph explaining why PostgreSQL was chosen for the payments service does not tell a governance system what exactly the AI agent is forbidden from writing, which files does this apply to, or what takes precedence when this conflicts with another decision.

Those questions require structured fields, not prose. And structured fields are the definition of decision memory.

The schema difference

The structural gap between documentation and decision memory is visible at the schema level. A typical ADR file has:

# ADR-012: Payment Service Storage Backend (documentation) Status: Accepted Date: 2024-03-15 ## Context We evaluated several storage backends for the payments service... (several paragraphs of prose) ## Decision We will use PostgreSQL with the SQLAlchemy ORM. ## Consequences All payment queries will go through the ORM layer...

A decision record for the same content has:

# Decision record extracted from ADR-012 (decision memory) id: ADR-012-storage type: decision title: Payment service must use PostgreSQL via SQLAlchemy ORM status: active scope: services/payments/** supersedes: ADR-004-storage priority: project tags: [payments, storage, orm, postgresql, database] constraint: Use PostgreSQL with SQLAlchemy ORM. No direct SQL. No SQLite. No SQLite in-memory for production paths. content: PostgreSQL chosen for SOC 2 audit logging compatibility (TR-7). SQLAlchemy required for team consistency. Exception requires architecture review.

The documentation version says what was decided. The decision record says what the AI agent is forbidden from doing, in which files, with what precedence, superseding which earlier record. These are the fields that governance evaluation requires.

Documentation has
  • Context (why we chose)
  • Rationale (why this option)
  • Consequences (what we accept)
  • Status (human-readable)
  • Date (when written)
Decision memory adds
  • Scope pattern (which files)
  • Machine-readable status
  • Supersedes (conflict lineage)
  • Priority tier (org/project/feature)
  • Constraint text (what to enforce)
  • Tags (retrieval signals)

Why precedence resolution requires decision memory

The hardest governance problem isn't enforcing a single decision — it's resolving conflicts between multiple applicable decisions. Conflicts happen constantly in real codebases:

  • An org-level rule says "no direct database calls from HTTP handlers."
  • A project-level decision says "this service's admin handler is an approved exception — approved by platform team on 2024-11-10."
  • A feature-level decision says "the new bulk-import handler needs direct connection for performance — feature flag: bulk-import-v2."

When an AI agent edits services/payments/bulk_import.py, all three decisions are potentially applicable. Which wins?

The answer requires a precedence engine that evaluates: status (is each decision active?), scope specificity (the feature-level decision is most specific), supersedes relationships (does any decision explicitly supersede another?), and priority tier (feature > project > org for exceptions). A governance system that lacks structured fields for any of these dimensions cannot compute the answer deterministically.

Documentation retrieval leaves conflict resolution to the model. When RAG surfaces three conflicting passages, the model picks whichever it finds most persuasive in context — which is probabilistic and ungovernable. Decision memory carries the fields that make conflict resolution computable.

The enforcement gap

Even if you resolve the retrieval and conflict problems, documentation-based approaches face a fundamental enforcement gap: they cannot validate generated output.

Enforcement requires two things the documentation path cannot provide:

  1. A structured constraint to evaluate against. "Use PostgreSQL with SQLAlchemy ORM" is a constraint. Three paragraphs about why PostgreSQL was chosen are not. Evaluation requires a precise, machine-testable assertion — what import is forbidden, what pattern must appear, what must not appear.
  2. A layer that inspects output after generation. The governance system must see what the AI actually wrote, compare it against the applicable constraints, and emit a verdict before the code reaches review. This is the Evaluator layer in Mneme — separate from retrieval, separate from the LLM, architecturally downstream of both.

Documentation retrieval injects context before generation. Decision memory enforcement checks output after generation. Both are necessary for a governance system that catches violations reliably.

pre-generation post-generation Context injection decisions → prompt LLM generation code output Evaluator output vs. constraints → verdict docs reach here only decision memory covers the full pipeline
Fig. 2 — Documentation retrieval reaches the pre-generation injection point only. Decision memory spans the full pipeline, including the post-generation enforcement check.

Converting documentation to decision memory

The path from documentation to decision memory doesn't require rewriting your ADRs. Mneme's ADR import integration reads existing ADR files and extracts a ## Constraints section that engineers add alongside (not replacing) the existing ADR structure.

Each constraint line in the Constraints section is compiled into a structured decision record in .mneme/project_memory.json. The ADR body — context, rationale, consequences — remains documentation and is not imported into the governance corpus. The two layers coexist: documentation for human understanding, decision memory for machine evaluation.

The ADR import workflow: Add a Constraints section to any existing ADR. Run mneme adr import to preview the records that would be created — it is dry-run by default — then re-run with --apply to write them. Dependency-style constraints become enforceable; the rest are stored as structured decisions, and your ADRs keep their documentation function.

The full comparison

Dimension Documentation Decision memory
Format Free-form prose Typed schema (id, scope, status, constraint…)
Primary consumer Human engineers Governance system (retriever + evaluator)
Retrieval method Keyword search or semantic similarity Deterministic field-weighted scorer
Scope handling None — applies conceptually Explicit glob patterns per record
Conflict resolution Left to reader / model Precedence engine (status, tier, supersedes)
Enforcement point Suggestion via context injection Deterministic evaluation with a PASS / WARN / FAIL verdict
Audit trail None — no per-decision trace Decision ID, field match, verdict per check
Version control File-level (who changed the doc) Record-level (which record, which field, which commit)
Supersession Noted in prose only Machine-readable supersedes field — old record deactivated

Why generic memory is insufficient

A generic memory layer — the kind that persists facts and context across an agent's turns and sessions — gets closer than documentation, because it is at least machine-readable. But remembering a decision as a fact is not the same as storing it as an authority. A memory layer can recall that the team chose PostgreSQL. It has no native notion that this choice is binding on services/payments/**, that it outranks an older SQLite note, or that it must win every retrieval where both appear. Memory preserves what was said. Decision memory preserves what was decided, and with what force. This is the line drawn in memory is not governance and RAG is not memory.

Why RAG retrieval is insufficient

RAG is the most common thing teams reach for when they want an agent to “know” their architecture: index the ADRs, retrieve the relevant ones, inject them into the prompt. It is genuinely useful for surfacing documentation, and it is the wrong tool for authority — for one structural reason.

Similarity is not authority. The most semantically similar ADR is not necessarily the governing one. The decision that applies may be the one with higher precedence, narrower scope, or newer supersession status — none of which a similarity score can see.

The reason is architectural, not a tuning problem. A vector index assigns relevance at read time, when the query arrives. Precedence and supersession are properties that only exist if they were recorded at write time, on the decision object. No amount of reranking recovers a field that was never stored. Hand a retriever two equally-relevant passages — one a binding decision, one prior-art commentary that was later superseded — and no similarity score separates them, because by the measure RAG uses they are the same. This is the failure analyzed in why RAG fails for architectural governance.

Decision memory and RAG solve complementary problems

None of this makes RAG or documentation wrong. They solve a different problem: surfacing what exists. Decision memory solves the adjacent one: resolving what is binding. The right architecture uses both — retrieve the relevant material, then resolve authority deterministically over the structured decisions that apply.

What the agent receives at the end of that resolution is the point. It does not need a wider window of semantically similar text. It needs the resolved set of decisions that bind the code it is about to write — the applicable, precedence-ordered constraints for the exact surface being edited. In Mneme that resolved set is delivered as a context packet: not top-k passages injected as suggestion, but the decisions that govern this change. The broader layer comparison is laid out in RAG vs governance and RAG vs coding memory.

When documentation and retrieval are the right tools

This isn't an argument against documentation or RAG. Documentation serves an irreplaceable function: it preserves the reasoning behind decisions in a form engineers can read, discuss, and learn from, and RAG makes that corpus searchable. A team with rich ADRs and good retrieval has a significant advantage over one without.

The argument is narrower: documentation is not a governance system, and treating it as one creates a false sense of safety. When a team believes their ADR wiki is enforcing architectural decisions in the AI coding workflow, they are not protected against the violations they think they are.

The right stack is both: documentation for human understanding and institutional memory, decision memory for machine-readable constraint enforcement. The two are complementary, not competing. Mneme is designed to sit alongside your documentation — it enforces the enforceable subset of your architectural decisions without asking you to stop writing ADRs.