Mneme HQ governs multi-step AI development workflows by acting as a shared governance layer that every agent in the pipeline can query. Whether a planner agent is scoping work, a coder agent is generating implementation, or a reviewer agent is evaluating output — all stages check the same decision store, ensuring architectural rules and anti-patterns are respected across the entire automated workflow. Memory tools recall context. Mneme governs implementation.
Reference Architecture · Simulated Scenario

Governing Multi-Step AI Development Workflows with Mneme HQ

Give every agent in your pipeline — planner, coder, reviewer, deployer — a shared memory of your architectural decisions. One decision store. Every stage enforced.

SHARED DECISION LAYER Stage 1 Planner Agent Stage 2 Coder Agent Stage 3 Reviewer Agent Deployer Agent Mneme HQ Decision Store decisions/*.yml shared across all stages semantic retrieval per query violations block next stage full audit trail per run check() check() check() check()
The Problem

Each agent in your pipeline starts with zero project memory.

Multi-agent development workflows — where a planner scopes tasks, a coder implements them, and a reviewer validates the output — are increasingly common. But each agent operates independently. The planner doesn't know what the coder knows. The reviewer doesn't know what was decided three sprints ago. Without a shared memory layer, architectural drift compounds at every stage of the pipeline.

The result: the planner proposes approaches that violate architecture rules, the coder implements them, and the reviewer — lacking the decision context — approves code that breaks conventions the whole team agreed on months ago.

Stage 1

Planner

Scopes tasks, picks approaches — no memory of past decisions

Stage 2

Coder

Implements the plan — violates architecture rules it can't see

Stage 3

Reviewer

Evaluates output — approves violations it has no context to catch

Stage 4

Deployer

Ships — architectural drift reaches production

Without Mneme HQ — planner proposes:
Task: "Build a search feature"
Plan: Query the database directly from the API handler using raw SQL for performance. Skip the repository layer to reduce latency...
With Mneme HQ — gate before planning:
mneme check --stage plan
✗ FAIL decision/no-direct-db-queries
Rule: All DB access via repository layer. Raw SQL in handlers banned ADR-004.

✗ FAIL decision/search-via-indexed-view
Rule: Search must use the search_index view, not raw table scans.

→ Plan rejected before coder agent is invoked.
Why Existing Tools Fall Short

No existing orchestration layer enforces project decisions.

ApproachLimitationWith Mneme HQ
Agent system promptsStatic; can't cover full decision history; each agent has its own contextShared decision store queried by any agent at any stage
Shared context windowGrows too large; stale; not structured for retrievalSemantic retrieval — only relevant decisions surfaced per query
Reviewer agentEvaluates code without decision context; can't catch architecture violationsReviewer queries Mneme HQ — violations flagged with rationale
CI checksPost-pipeline; violations already implemented; expensive to fixGate at each stage — planner, coder, reviewer all pre-checked
How Mneme HQ Solves It

A shared architectural governance layer for every agent in the pipeline.

1

Build a shared decision store

All architectural decisions, constraints, and anti-patterns live in one decisions/ directory — a single source of truth every agent can query.

2

Gate each pipeline stage

Add a mneme check call before invoking the next agent. If the plan violates decisions, stop before the coder runs. If code violates decisions, stop before the reviewer runs.

3

Inject decision context per stage

Each agent receives only the decisions relevant to its stage — planners see architecture rules, coders see implementation constraints, reviewers see the full violation report.

4

Log violations for audit

Every check is logged. You get a full audit trail of which decisions were evaluated, which violations were caught, and at which stage — across every pipeline run.

Technical Implementation

Wiring Mneme HQ into a multi-agent pipeline.

Pipeline orchestrator — Python pseudocode
# Before invoking coder agent
plan = planner_agent.run(task)

check = mneme.check(plan, mode="strict", stage="plan")
if check.has_violations():
    # Return violations to planner for revision
    plan = planner_agent.revise(plan, violations=check.violations)

# Before invoking reviewer agent
code = coder_agent.run(plan)

check = mneme.check(code, mode="strict", stage="code")
if check.has_violations():
    # Block reviewer; return to coder with context
    code = coder_agent.fix(code, violations=check.violations)
Terminal — stage-scoped check output
$ mneme check "plan: add search via raw SQL in handler" --stage plan

Checking against 14 decisions (stage: plan)...

✗ FAIL decision/no-direct-db-queries
  Stage: plan — approach violates repository layer constraint.
✗ FAIL decision/search-via-indexed-view
  Stage: plan — raw table scan violates search architecture decision.
✓ PASS decision/search-pagination-required
✓ PASS decision/rate-limit-on-search-endpoints

Pipeline gate: BLOCKED — 2 violations at plan stage.
Coder agent not invoked. Violations returned to planner.
Simulated Outcome

What teams see after adding decision gates to their pipeline.

4 stages
where decisions are enforced — planner, coder, reviewer, deployer
~0
architectural violations reaching the reviewer stage
1
shared decision store for all agents — no per-agent prompt duplication
These figures are based on a simulated reference scenario — not live customer data.
FAQ

Common questions.

Does Mneme HQ work as a Python library or only as a CLI?
Mneme HQ exposes both a CLI (mneme check) and a Python API (from mneme import check), so it can be embedded directly into orchestration code — LangGraph, custom pipelines, or any Python-based agent framework.
Can different agents query different subsets of decisions?
Yes. Use --tags to scope checks per stage — e.g., --tags architecture for the planner, --tags security,compliance for the coder, and no filter for the reviewer to see everything.
How does this position Mneme HQ relative to single-agent copilots?
Single-agent copilots enforce decisions at one point in the workflow. Mneme HQ as a shared memory layer scales that enforcement to every stage of a multi-agent pipeline — making it infrastructure for AI-assisted development, not just a copilot add-on.
What agent frameworks does this work with?
Mneme HQ is framework-agnostic. It works anywhere you can run a Python function or shell command — LangChain, LangGraph, Claude Agent SDK, AutoGen, custom pipelines, or plain subprocess calls from any orchestrator.