AI Has Made Code Cheap

An agent can produce 2,000 lines of working code in minutes. The tests pass. CI is green. The pull request reads plausibly. The question nobody in that loop has answered: should the change have been 200 lines? Did it reuse the abstractions the codebase already has? Did it preserve the decisions the team already made about boundaries, dependencies, and data access?

Generated volume is not engineering progress. The economics of producing code have collapsed, and every other cost in the delivery pipeline — understanding the change, reviewing it, maintaining it, unwinding it when it turns out to be wrong — stayed where it was. Some of those costs grew, because there is now more code to apply them to.

The New Constraint Is Verification

Glean’s Work AI Index, a survey of 6,000 full-time digital workers across the US, UK, and Australia, gives the pattern a name: botsitting — the unbudgeted labor of making AI usable. Workers in the survey report saving around 11 hours a week with AI while spending 6.4 hours a week botsitting: 2.3 hours feeding the AI context, 2.2 hours supervising its output, and 1.7 hours debugging its mistakes. That is 37% of all AI-related time spent on supervision rather than production, and only 13% of workers say AI has made their organization perform significantly better. The numbers are self-reported and directional rather than precise — but the shape of the curve matches what engineering teams see daily.

For software teams the bill itemizes differently, and we’d call it a verification tax: reviewing more generated code, checking whether an implementation follows existing architectural decisions, spotting unnecessary complexity, finding subtle boundary violations, and debugging changes no one fully understands. That last item is not hypothetical. In Glean’s data, 41% of workers admit they sometimes deliver AI-produced work they could not fully explain if asked. Run that behavior through a merge queue for a few quarters and the codebase fills with code whose authorship is nominal and whose rationale is nobody’s.

The report also flags where this goes next: agents can run workflows end to end without a human checking each step. Every step a human used to touch was an accidental checkpoint. Autonomy removes them faster than teams replace them with anything deliberate.

Passing Tests Does Not Mean Preserving Architecture

The expensive part of the verification tax is the gap between local correctness and system-level correctness. A concrete version: the billing service is supposed to reach the ledger only through its API boundary. An agent asked to improve latency adds a direct database read. The change works. The tests pass — they test behavior, and the behavior is fine. The latency goal is met. And the architecture just drifted, because the constraint that mattered was never expressed anywhere a test could see it.

Architectural drift compounds exactly this way: a series of individually reasonable changes, each locally valid, each approved by a reviewer who had no machine-checkable statement of the system’s intent. Glean’s report makes the adjacent point about polish — fluent output removes the visual cues that used to trigger scrutiny. Clean diffs with passing tests are the engineering version of fluent prose. The polish is precisely what makes the drift invisible.

Tests answer “does it work?” The verification tax is mostly spent on a different question: “is this how our system is supposed to be built?” — and that question currently has no automated answerer in most pipelines.

Why Existing Controls Are Insufficient

Teams already operate several control layers, and each one covers a real slice of the problem. None of them was designed to check a proposed change against the team’s recorded architectural decisions. Even policy itself bends under adoption pressure: Glean found 54% of the highest-achieving AI users work around approved tools or use them in ways that violate company policy. Documents do not hold against incentives.

ControlUseful forWhat it misses
TestsFunctional correctnessArchitectural intent
LintersCode quality and styleSystem-level decisions
Code reviewHuman judgmentDoes not scale with generated volume
Rule filesInstructing agentsIgnored, forgotten, or bypassed
RetrievalSurfacing relevant contextContext alone does not guarantee compliance
Verification contractsEnforcing architectural decisionsThe layer most pipelines are missing

We have traced the individual failure modes before: rule files stop scaling past token budgets and precedence conflicts, and code review cannot scale with agent output, which is how the PR queue turns into an incident-response layer. The verification tax is what those failures cost, denominated in senior engineers’ hours.

Governance Must Become Executable

The way out is not more manual review or a longer instruction file. It is moving the most important architectural decisions out of documents and into verification contracts — checks that run when an agent proposes a change. The loop looks like this:

  1. Record the architectural decision as structured data, with its constraints — not prose in a wiki.
  2. Retrieve the relevant decisions when the agent works, at the point of generation.
  3. Check the proposed change against them deterministically — same change, same verdict.
  4. Reject violations before the pull request exists.
  5. Return the reason, so the agent can retry compliantly instead of guessing.

Each step refunds part of the tax. Retrieval cuts the feeding-context hours — the largest botsitting line item in Glean’s data at 2.3 hours a week. Deterministic checks cut the supervision hours, because reviewers stop re-deriving the architecture from scratch on every diff. Rejection before the PR cuts the cleanup hours, because drift stops reaching the merge queue in the first place.

The Goal Is Not to Block AI Agents

The goal is to make their speed sustainable. Generation is becoming abundant; verification is becoming the scarce resource, and scarce resources need infrastructure, not heroics. Teams that turn their architectural decisions into executable, retrievable, enforceable constraints get to keep the 11 hours without paying the 6.4 back — and get to scale agent autonomy without scaling the anxiety that currently accompanies it. The verification tax is not an argument against coding agents. It is the line item that tells you which layer to build next.