Zaxy 2.1–2.3 Implementation Plan

Companion to 2026-06-10-zaxy-2-1-2-3-agent-experience-cognitive-memory-roadmap-design.md. That document says what and why; this one says where, in what order, and how each slice is verified. File references are to the current tree (2.0.1, post-decomposition: src/zaxy/cli/ package, src/zaxy/retrieval_plan/ and src/zaxy/synthesis/ packages, 33-line __main__.py shim).

Ground Rules

Existing Seams This Plan Builds On

Seam Location Used by
MCP tool table + handler dispatch src/zaxy/mcp_server.py (Tool list ~L125–1056, dispatch ~L3190+) profiles, FoK tool, budget params
Checkout policy, diagnostics, prompt formatting src/zaxy/checkout.py budget packing, ordering tiers, salience diagnostics
Context assembly policy src/zaxy/context.py budget knapsack, stability tiers
Named retrieval profiles src/zaxy/retrieval_profile.py salience blend, PPR stage, plain fallback
Preflight checks src/zaxy/doctor.py doctor deepening
Hook clients + status src/zaxy/hooks.py, zaxy hook-event in src/zaxy/cli/workspace.py:525 session-resumed event
Compaction safety audits src/zaxy/compaction.py recovery packet citation rails
Review-gated consolidation src/zaxy/consolidation_pipeline.py procedure mining, interference proposals
Metacognition event contracts src/zaxy/metacognition.py FoK signals, known-unknown links
Tool-call capture MCPServer.capture_tool_call_completed (src/zaxy/mcp_server.py:1198) procedure mining input
Embedding providers src/zaxy/embedding.py version stamping, quantization
Vector index + LRU/byte budget src/zaxy/embedded_graph_store.py (_VectorIndex, _evict_vector_indexes_over_budget) HNSW path, quantized matrices
Query-page log-signature cache src/zaxy/core.py (_query_page_log_signature) stability-tier invalidation reuses the same pattern

Phase 1 — Zaxy 2.1

1.1 Tool profiles and front door (2.1-alpha.1)

New module src/zaxy/tool_profiles.py:

Touches:

Tests: tests/test_mcp_server.py — core profile lists ≤ 8 tools; full listing byte-identical to today; hidden tool still dispatches; capabilities reports the delta. Existing tool-name regression tests unchanged.

1.2 Verb consolidation (2.1-alpha.1, additive)

Two new umbrella tools, full-profile only at first:

Implementation is a thin dispatch shim per umbrella tool; legacy tools keep their handlers and tests. No handler logic moves.

Tests: umbrella dispatch parity — each operation produces the same payload as its legacy tool against the same fixture log.

1.3 Budgeted checkout (2.1-alpha.2)

New module src/zaxy/token_budget.py:

Touches:

Tests: monotonicity (raising budget never removes a previously included section), determinism, elision reporting, zero-budget edge (header-only packet with explicit elision), citation coverage invariance per budget level.

1.4 Cache-stable ordering (2.1-alpha.2)

In src/zaxy/checkout.py + src/zaxy/context.py:

Tests: two checkouts, no intervening append → byte-identical prefix; append → prefix changes; tier ordering asserted on a mixed fixture.

1.5 Compaction recovery + doctor (2.1-beta.1)

Recovery:

Doctor (src/zaxy/doctor.py): add checks — hash-chain verify over active log tail, projection freshness vs. log signature, embedding provider availability and dimension agreement, vector-cache budget headroom (VECTOR_INDEX_CACHE_MAX_BYTES), profile sanity, hook installation (inspect_hook_status already exists). Each check returns (status, remediation); CLI renders one line per failure.

Tests: scripted compact-then-resume over a fixture log recovers a seeded open task and accepted finding with citations; doctor check matrix with each failure injected.

Phase 1 exit: tool-adoption, budget, and cache lanes recorded in zaxy_benchmarks (new lane modules follow the existing benchmark layout); release notes; site rebuilt.

Phase 2 — Zaxy 2.2

2.1 Salience ledger (2.2-alpha.1)

New module src/zaxy/salience.py:

Emitters:

Projection: salience computed during projection rebuild and incrementally on append; exposed in checkout diagnostics only. No ranking change in this increment.

Tests: replay determinism (same log → same scores), rebuild-equals-incremental, diagnostics composition shows each factor.

2.2 Forgetting, cues, gate, interference (2.2-alpha.2)

Tests: plain-profile byte-parity with 2.1 ranking; floor exemptions; explicit query reaches attenuated memories; gate tags reversible by replay (re-project with gate off → identical ranking state); proposal emission on a seeded contradiction fixture.

2.3 Metamemory + graph walk + mining (2.2-beta.1)

Feeling of knowing:

Personalized PageRank:

Procedure mining:

Tests: FoK calibration harness with seeded corpora (known-hit, known-miss, ambiguous); PPR correctness on a hand-built graph (analytic stationary distribution within tolerance) and determinism; mining support threshold, citation completeness, and review-gating (no skill without acceptance).

Phase 2 exit: forgetting, FoK, and PPR lanes green; cognitive profile documented in docs/retrieval.md; defaults unchanged.

Phase 3 — Zaxy 2.3

3.1 Embedding scale (2.3-alpha.1)

Versioning first (it gates everything else):

HNSW:

Quantization:

Tests: version isolation (cross-tag search returns nothing rather than garbage), lazy migration round-trip, doctor mixed-version report; HNSW recall@10 ≥ 0.95 vs. exact on a seeded 10^5 corpus (marked slow/integration), threshold hysteresis; int8 rerank recall and byte accounting.

3.2 Defaults and freeze (2.3-rc.1)

Sequencing and Dependencies

1.1 profiles ──► 1.2 umbrellas
1.3 budget ──► 1.4 cache ordering          (packer feeds tier renderer)
1.5 recovery+doctor                        (independent of 1.1–1.4)
2.1 salience ledger ──► 2.2 forgetting/gate ──► 2.3 FoK (uses salience
                                                histograms)
2.3 PPR, mining                            (independent of salience)
3.1 versioning ──► HNSW, quantization      (within one increment)
3.2 freeze                                 (last)

Parallelizable pairs for solo work: (1.3+1.4) with (1.5); (2.3 PPR) with (2.3 mining). Everything in Phase 1 is independent of Phase 2's event taxonomy, so 2.1-alpha can ship while salience design is still settling.

Estimated Shape

Increment New modules Touched modules Test surface
2.1-alpha.1 tool_profiles mcp_server, config, cli/serving, docs test_mcp_server
2.1-alpha.2 token_budget context, checkout, mcp_server, cli new test_token_budget + test_checkout
2.1-beta.1 recovery cli/workspace, doctor, hooks docs new test_recovery, test_doctor
2.2-alpha.1 salience mcp_server (emitters), projection new test_salience
2.2-alpha.2 retrieval_profile, retrieval_plan, salience, core test_retrieval_plan, test_salience
2.2-beta.1 graph_walk, procedure_mining metacognition, mcp_server, projection, cli new test_graph_walk, test_procedure_mining, test_metacognition
2.3-alpha.1 embedding, embedded_graph_store, doctor, cli test_embedded_graph_store, new scale lane
2.3-rc.1 config defaults, docs full suite

Open Questions (decide at increment start, not now)