For most of software engineering history, writing code was the bottleneck. Senior engineers reviewed what juniors wrote, and the review burden was proportional to human writing speed. The ratio was manageable.

AI coding assistants break this assumption. When a single engineer can generate a thousand lines of plausible, compilable code in an hour, the bottleneck shifts. Generation is no longer scarce. Review is.

The volume problem is already here

Teams that have adopted AI coding assistants at scale — Claude Code, Cursor, Copilot, Devin — consistently report the same pattern: PR volume increases faster than review capacity. This isn't surprising. A single AI agent operating on a well-scoped task can produce a multi-file changeset in minutes that would take a human engineer half a day to write.

10–100×
Typical AI output volume increase vs. unassisted development on contained tasks
~60 min
Average time for a thorough senior-engineer code review of a meaningful PR
Linear
Review capacity growth — bounded by human headcount, not AI throughput

The arithmetic is straightforward and uncomfortable. If your team generates 10× more code, someone still has to review 10× more code. You cannot hire 10× more reviewers to compensate — even if you could find them, reviewers themselves become AI-assisted and generate more output. The bottleneck tightens.

Reviewing AI output is harder than reviewing human output

The volume problem would be difficult enough. But AI-generated code is also harder to review than human-written code in several specific ways.

No institutional memory between sessions

A human engineer writing a service that interacts with the payments pipeline carries context from prior code reviews, architecture discussions, and incident postmortems. An AI agent starting a new session has none of this unless it's explicitly provided. The result is code that is syntactically correct and passes tests but violates architectural invariants that aren't written down anywhere the model can see.

A reviewer must now catch these violations — and must themselves carry the institutional memory that the AI lacked. For senior reviewers, this is already their job. But at AI output volumes, there are not enough senior engineers with the right context to cover every PR.

Plausible violations are harder to catch than obvious ones

Human engineers tend to violate architectural rules either obviously (introducing a new dependency without discussion) or not at all (they know the rules). AI agents produce a third category: code that looks architecturally correct but violates a constraint in a subtle way — a service reaching across a boundary via a shared utility function, or a new table added to a database that was supposed to be read-only from this service.

These violations pass automated tests. They pass linters. They require a reviewer who understands the architectural intent, not just the syntactic contract. That reviewer's time is exactly the scarce resource we started with.

Style and convention drift

AI coding assistants have their own implicit style conventions, drawn from training data rather than your codebase's evolution. Without explicit constraint injection, they drift toward generic patterns — which means reviewers must also police style consistency that previously self-enforced via team osmosis. This is low-stakes individually and expensive in aggregate.

The "just review more carefully" response doesn't work

The intuitive organizational response to this problem is to tighten review requirements: require two reviewers, require an architect sign-off on structural changes, institute more thorough checklists. This approach fails for a predictable reason: it reduces velocity precisely when AI-assisted development is supposed to be increasing it.

The fundamental tension: tighter review processes reduce the speed advantage that AI coding provides. Looser review processes allow architectural violations to compound. There is no review-process solution to a generation-speed problem.

Teams that try to solve this with review tooling — AI-assisted code review, static analysis, architecture rule checkers — observe partial improvement. These tools can catch mechanical violations: undefined variables, type errors, obvious anti-patterns. They cannot catch violations that require understanding your team's specific architectural decisions. That context isn't in the training data and can't be injected efficiently at review time.

Where violations get caught · cost rises sharply the later it happens
Pre-generation
governance hook
Pre-commit
local hook
CI / tests
pipeline check
Code review
reviewer attention
Post-merge
incident response
cheap to fix · before code exists expensive · after merge, after deploy →

The shift-left argument

Security engineering confronted a structurally identical problem a decade ago. When application development accelerated and security testing was relegated to the end of the pipeline, the volume of vulnerabilities reaching production exceeded the security team's capacity to address them. The response was the shift-left movement: move security checks earlier in the development process, so violations are caught before they accumulate.

The same logic applies to architectural governance. If you move constraint enforcement to before the AI agent writes the file — rather than after the PR is opened — you eliminate the violation before it needs to be reviewed. No review time consumed. No back-and-forth on the PR. No accumulated drift.

Review burden comparison — 10-engineer team, AI-assisted development
Scenario Post-generation review Pre-generation enforcement
PRs per week ~80 ~80
Architectural violations per week 12–18 caught in review 1–3 caught at generation
Review cycles per violation 1–3 round trips Blocked before PR opens
Senior reviewer time on governance ~6 hrs/week <30 min/week
Drift accumulation Compounds with velocity Blocked at source

What pre-generation enforcement actually requires

Shifting enforcement left is the right direction. Implementing it correctly requires more than telling the AI agent "follow our rules" in the system prompt.

Effective pre-generation enforcement needs:

  • A structured decision corpus — architectural decisions captured in a machine-readable schema, not free-form documentation. Decisions must have explicit scope, status, and constraint fields.
  • Scope-aware retrieval — the ability to retrieve only the decisions relevant to the specific file or module being modified, not a semantic-similarity approximation of what might be relevant.
  • Hook-level integration — enforcement must happen at the tool-use layer, before the write completes, not in the prompt or post-hoc in review. This means integrating with the agent's execution hook, not the model's input.
  • A precedence engine — when multiple decisions apply, the system must resolve conflicts deterministically rather than leaving the model to interpret contradictions.

None of these requirements are met by a system prompt containing your ADR documents, or by a RAG pipeline that retrieves them. They require a governance architecture that treats decisions as structured, executable constraints rather than advisory text.

The cost of not solving this

Teams that adopt AI coding assistants without addressing the review bottleneck converge on one of two failure modes:

Velocity collapse — review requirements tighten to the point that AI-generated PRs queue for days, negating the generation speed advantage. Engineers stop using the AI assistant for anything structurally significant and revert to manual development for complex tasks.

Architectural debt accumulation — review is loosened or overwhelmed, violations merge, and the codebase drifts away from its intended architecture over months. The debt is invisible until it becomes expensive: a compliance audit, a production incident, or a major refactor that takes a quarter to complete.

Both outcomes are predictable. Both are avoidable if the governance problem is addressed at the generation layer rather than the review layer.

The structural conclusion

Code review is a human-time-bounded process. AI code generation is not. You cannot solve a generation-speed problem with a review-speed solution. The governance layer must operate at generation time, enforcing architectural constraints before the code is written — not after it's merged.

This is the architectural shift that the current generation of AI coding tools hasn't yet made. It's also the gap that Mneme HQ is built to close.