# Mneme HQ: full reference

> Mneme HQ is the architectural governance layer for AI-assisted development. It compiles your team's architectural intent into enforceable constraints that govern AI coding agents at the pre-generation stage, before architectural drift reaches review.

## What Mneme is

Mneme HQ is the architectural governance layer for AI-assisted development. As agent platforms proliferate, governance becomes infrastructure, and Mneme is positioned as the pre-generation governance layer of that stack.

The enterprise framing: a governance and control plane for AI coding agents operating within Azure and GitHub-based engineering workflows. The long-term vision: Mneme integrates with your engineering stack and compiles architectural intent into enforceable AI coding constraints across generation, pull request, and CI workflows.

## What Mneme is not

Mneme is not a memory tool. Mneme is not a rules file like Cursor Rules or CLAUDE.md. Mneme is not a RAG system or a vector database for code. Each of those exists for a reason. None of them govern implementation.

The locked category positioning:

1. Rules files document standards. Mneme enforces them.
2. Memory tools recall context. Mneme governs implementation.
3. RAG retrieves knowledge. Mneme operationalizes decisions.

## The AI coding governance stack

Pre-generation governance is Mneme. Mneme compiles architectural intent into enforceable constraints before the agent generates code.

Generation and runtime is the agent framework and runtime harness layer. This includes Cursor, Claude Code, agent frameworks, and managed agent platforms.

Post-generation observability is the layer above. Tools like SentRux operate here. SentRux tells you when the agent violated architecture. Mneme helps prevent the violation from being proposed in the first place. Pre-generation governance and post-generation observability are complementary layers of the AI coding stack, not competitors.

## The problem Mneme addresses

AI increased code output. Review capacity did not. Coding assistants generate code faster than teams can review it, but review bandwidth has not increased. That means more surface area to validate, more architectural drift to catch, and more governance pushed downstream into pull request review.

The issue is not model quality. The issue is that coding agents do not retain architectural decisions by default. Three observable symptoms: more pull request surface area for reviewers to validate, reactive governance where architectural violations are caught after generation, and session amnesia where coding agents forget prior decisions unless re-prompted every time.

## How Mneme works

Five stages. No vector store. No ML.

The pipeline: project_memory.json into MemoryStore into Retriever into ContextBuilder into LLMAdapter into Evaluator.

1. Load. Load structured project memory from a human-editable JSON file: rules, constraints, facts, and prior decision records.
2. Retrieve. Retrieve only the rules relevant to the current task using deterministic keyword scoring. Same query, same result.
3. Build. Build a context packet that injects the relevant constraints into the prompt before generation.
4. Adapt. Pass the context to the LLM adapter, which works with direct API integrations, IDE coding assistants, agent frameworks, managed agent platforms, and internal prompt pipelines.
5. Evaluate. Validate the generated output against the rule set before the code reaches the repository.

Retrieval is deterministic. There is no embedding model, no vector store, and no ML dependency in the governance path. This is a deliberate architectural commitment.

## Governance Benchmark v1.1

Governance Benchmark v1.1 methodology is published. The benchmark is deterministic and reproducible. It measures violation prevention rate, baseline drift rate, false positive rate, retrieval recall and precision at K, irrelevant injection rate, and the gap between end-to-end enforcement and oracle enforcement.

Suite composition: 36 scenarios across six categories. 8 architectural violation scenarios, 6 scope and boundary scenarios, 6 anti-pattern scenarios, 4 dependency and tooling scenarios, 4 ambiguous and borderline scenarios, and 8 control scenarios where the rule does not apply.

Source mix: 33 percent synthetic canonical, 28 percent real drift incidents from production codebases, 17 percent adversarial, 22 percent controls.

Verdict thresholds, all four required: violation prevention at or above 75 percent, baseline drift at or above 50 percent, false positive rate at or below 10 percent, oracle gap within 10 percentage points.

Verifiers operate on structured JSON output, not on freeform prose. The model is required to emit a response conforming to a fixed schema, and verifiers inspect the artifact directly. This closes the prose-gaming attack on benchmark verification.

Full benchmark results publish after scenario suite validation. Methodology, scenarios, verifiers, and harness are public.

## Roadmap

Phase 1, current. OSS developer wedge. Architectural governance for individual developers and early engineering adopters.

Phase 2. Team governance layer. Shared policy and decision stores for teams adopting AI-assisted development.

Phase 3. Agent platform integrations. Governance for enterprise agent workflows and managed coding platforms.

Phase 4. Governance infrastructure. Policy as code enforcement and drift analytics across engineering organizations, spanning generation, pull request, and CI workflows.

## Compatibility

Mneme works with direct LLM API integrations, IDE coding assistants such as Cursor and Claude Code, agent frameworks, managed agent platforms, and internal prompt pipelines.

## Demo

The interactive demo at https://mnemehq.com/demo/ runs the same prompt through the same model twice. The prompt: "Refactor the storage backend for scalability."

Without Mneme. The model recommends migrating JSON storage to PostgreSQL or Redis, introducing an ORM, and adding a migration layer. The advice is reasonable in the abstract and ignores three architectural decisions this codebase has already made.

With Mneme. The relevant decision records are injected into the model's context before generation: ADR-001 (JSON storage only — no external DB), ADR-003 (no ORM in v1), ADR-005 (extend before rebuild). The model proposes extending the existing JSON storage module instead of replacing it. Same prompt, same model, different answer.

The third panel runs `mneme check --mode strict` against the generated diff and produces a structured verdict: PASS on the storage decision (JSON only, no new databases), PASS on the auth pattern (JWT middleware unchanged), WARN on a new dependency (prisma not in approved list), and FAIL on a Repository pattern violation (ADR-004 bypassed in user.service.ts).

The demo illustrates the difference between context injection and architectural governance. Injection makes the rules visible to the model. Enforcement makes generated output answer to them. Mneme does both, in one tool-agnostic layer that works across Claude Code, Cursor, GitHub Copilot, Windsurf, and custom SDK agents.

## Demo scenarios

Each scenario below walks through one verdict from the mneme check sample output. URLs are at https://mnemehq.com/demo/<slug>/.

Storage decision (PASS) — slug: storage-decision. ADR-001 is the rule "JSON storage only — no external database" plus ADR-003 (no ORM in v1) and ADR-005 (extend before rebuild). Without Mneme, the model recommends migrating to PostgreSQL or Redis with an ORM and a migration layer, silently violating all three ADRs. With Mneme, the relevant decision records are injected before generation and the model proposes extending the existing JSON storage module. mneme check returns PASS on the storage decision, with the originating decision ID surfaced in structured output.

Dependency policy (WARN) — slug: dependency-policy. The team's approved dependency list lives in project_memory.json with type "dependency_policy" and an explicit allowlist (FastAPI, Pydantic, httpx, pytest, ruff, internal packages). During an AI-assisted refactor, the agent reaches for prisma — a real, popular library, but not on the approved list. Without Mneme the import lands silently in the next PR. With Mneme, mneme check emits a structured WARN with the originating decision ID and a tracked override path. WARN rather than FAIL is intentional: dependency introductions often have legitimate reasons (security re-approval, deliberate spike), so the default is "flag for review" not "hard-block." Strict enforcement is available via mneme check --mode strict.

Repository pattern (FAIL) — slug: repository-pattern. ADR-004 records the rule "Repository pattern is the only data-access boundary." Service classes depend on repository interfaces, never on the underlying storage. During a "speed up user.service.ts" refactor the agent inlines a direct database query, bypassing the abstraction. The diff looks clean in the editor; the linter is happy; the tests pass. mneme check --mode strict catches the architectural-pattern violation and emits a hard FAIL with file path and decision ID. In CI via the GitHub Actions integration the FAIL exit code blocks the PR until the violation is resolved or an explicit override decision record is added.

The three scenarios cover the full PASS / WARN / FAIL verdict spectrum and demonstrate the governance loop end to end: decisions stored as structured records, injected pre-generation, enforced post-generation, with override events themselves tracked as first-class decisions.

## Standards landscape

Three efforts are forming the credible foundation for a future cross-tool agent governance standard. The standards landscape page at https://mnemehq.com/standards/ tracks all three with links to organizational primary sources only.

NIST CAISI AI Agent Standards Initiative. Launched February 2026 by the Center for AI Standards and Innovation at NIST. Stated aim: AI agents that "interoperate smoothly across the digital ecosystem." Concrete artifacts so far: a January 2026 RFI on securing AI agent systems (closed March 9, 2026), the NCCoE concept paper on AI agent identity and authorization, and listening sessions on barriers to AI adoption in healthcare, finance, and education. Current scope is identity and authorization; output-policy enforcement is adjacent and likely to follow.

Model Context Protocol. Open, JSON-RPC-based protocol for exposing context, tools, and resources to AI clients. Latest specification dated 2025-11-25. Does not specify a governance content format, but is the substrate over which a structured decision store can be made queryable to any compliant agent.

AGENTS.md. Markdown convention for per-repo instructions, stewarded by the Agentic AI Foundation under the Linux Foundation. Adopted across OpenAI Codex, Cursor, Aider, Factory, Google Gemini CLI and Jules, Zed. As a static-context format, AGENTS.md does not resolve precedence between conflicting decisions or enforce anything at the hook layer, but it is the most credible cross-vendor baseline that exists today for the static portion of the problem.

Mneme alignment is honest: tracking, design-aligned, planning to engage with future RFIs. Mneme is not currently NIST-endorsed, not a Linux Foundation member, and has not filed contributions. The standards page draws this line explicitly.

## Project links

- Homepage: https://mnemehq.com/
- Governance Benchmark v1.1 methodology: https://mnemehq.com/benchmark/
- Demo: https://mnemehq.com/demo/
- Standards landscape: https://mnemehq.com/standards/
- Works with (compatibility surface): https://mnemehq.com/works-with/
- Platforms (Azure & GitHub Enterprise, AWS, Google Cloud, self-hosted): https://mnemehq.com/platforms/
- CLI reference: https://mnemehq.com/docs/cli/
- Governance violations (what Mneme prevents): https://mnemehq.com/docs/governance-violations/
- Use cases: https://mnemehq.com/use-cases/
- Insights: https://mnemehq.com/insights/
- Generative AI software engineering stack (seven-layer frame): https://mnemehq.com/insights/generative-ai-software-engineering-stack/
- Supported languages hub: https://mnemehq.com/supported-languages/
- Python governance (Tier 1, production-grade): https://mnemehq.com/supported-languages/python-governance/
- TypeScript governance (Tier 1, operationally supported): https://mnemehq.com/supported-languages/typescript-governance/
- JavaScript governance (Tier 1, operationally supported): https://mnemehq.com/supported-languages/javascript-governance/
- Supported languages canonical docs (coverage matrix, limitations, roadmap): https://mnemehq.com/docs/supported-languages/
- Roadmap: https://mnemehq.com/roadmap/
- Source: https://github.com/TheoV823/mneme
- Benchmarks: https://github.com/TheoV823/mneme/tree/main/examples/benchmarks
