Zaxy 2.1–2.3 Agent Experience and Cognitive Memory Roadmap Design

Purpose

Zaxy 2.1 through 2.3 make memory effortless for agents to adopt, economical to consult, cognitively informed in how it ranks and forgets, and scalable in its embedding layer. These releases land before the 2.5/3.0 latent memory roadmap (see 2026-06-09-zaxy-2-5-3-0-latent-memory-roadmap-design.md) and several items feed directly into its lanes.

The motivating observations:

The guiding rule is:

Memory should be cheap to consult, honest about what it knows, and quiet about what no longer matters — without ever deleting truth.

Research Anchors

This roadmap stays close to the memory literature — human and agent — without copying any one paper's assumptions into production.

Release Positioning

Zaxy 2.1: Agent Experience

The agent-facing surface gets a front door, a budget contract, and a closed compaction loop.

Primary thesis:

An agent should be able to adopt Zaxy with one tool, state a token budget, and survive a context compaction without losing what mattered.

Zaxy 2.2: Cognitive Memory

Retrieval ranking adopts salience, forgetting, cue-matched recall, metamemory, and graph-walk retrieval — all as projection-level policy over the unchanged immutable log.

Primary thesis:

Zaxy can forget the way organisms forget — by attenuation, not erasure — and every attenuation is replayable, inspectable, and reversible.

Zaxy 2.3: Embedding Scale

The embedding layer gets an approximate-nearest-neighbor path, quantization, and a versioned migration story.

Primary thesis:

Exact search stays the default where it is exact and fast; scale is opt-in, measured, and never silently lossy.

Contributions forward to 2.5/3.0

Current Architecture Fit

Nothing in this roadmap weakens the existing invariants:

Designs

Tool Surface Profiles and Verb Consolidation

The MCP server gains a profile parameter (zaxy serve --profile core|full, MCP_TOOL_PROFILE setting). The core profile lists a small verb set: memory_checkout, memory_append, memory_query, context_assemble, memory_feedback, memory_invalidate, plus memory_capabilities as the discovery escape hatch. The full profile lists everything, unchanged.

In parallel, the long tail of single-purpose tools is consolidated behind operation-enum tools where the grouping is natural (consolidation lifecycle, confidence/metacognition reads). Consolidated tools are additive; the existing names remain available in the full profile through at least 2.x per the stability commitment.

memory_capabilities becomes the canonical "what else can I do" tool so a core-profile agent can discover and request the full surface.

Single Front Door

Tool descriptions and docs converge on one message: call memory_checkout first; everything else is plumbing or power use. The checkout description gets rewritten as the entry point; quickstarts, MCP docs, and the workspace instruction block emitted by ensure_session_initialized all point at it.

Doctor Deepening

zaxy doctor (existing preflight in src/zaxy/doctor.py) gains checks for: hash-chain integrity over the active log, projection freshness vs. log signature, embedding provider availability and dimension agreement, vector index cache budget headroom, MCP profile sanity, and hook installation status — with one-line remediation per failure.

Token-Budget Checkout

context_assemble and memory_checkout accept max_tokens. Packing is a greedy salience-per-token knapsack over candidate packet sections with a deterministic tokenizer estimate (chars/4 fallback; provider tokenizer when configured). The response reports budget_requested, budget_used, and what was elided, so the agent knows recall was truncated rather than empty.

Cache-Stable Packet Ordering

Checkout output is reordered into stability tiers: (1) consolidated/accepted facts and procedures that change rarely, (2) session-scoped state, (3) query-specific evidence. Tier-1 content is serialized canonically (stable sort keys, no timestamps in the rendered prefix) so repeated checkouts in one session produce byte-identical prefixes and provider prompt caches hit. A diagnostics field reports the stable-prefix length so cache efficiency is measurable.

Compaction Recovery Loop

The precompact hook already records compaction. The loop closes with a session-resumed hook event: after the harness compacts, Zaxy assembles a recovery packet — checkout against pre-compaction state, diffed against what a summary plausibly preserves (recent verbatim, open tasks, accepted findings, known unknowns) — and emits it for re-injection. src/zaxy/compaction.py (identity-preserving compaction audits) provides the safety rails: the recovery packet must cite only Eventloom-backed state.

Salience and Projection-Level Forgetting

Each projected memory carries a salience score:

salience = recency_decay(last_use) × reinforcement(use_history) × base_importance

Write-Time Encoding Gate

At append time, an optional gate classifies the event against current checkout state: novel (contradicts or extends), reinforcing (confirms), redundant (duplicates). The tag rides in event metadata. Projection treats redundant events as reinforcement signals rather than new ranked entries. The gate is off by default in 2.2-alpha and opt-in until measurement shows no recall regression.

Encoding-Specificity Cues

Appends capture a cue record when available: active task or mission, repo or workspace identity, originating tool, session phase. Checkout computes a cue overlap term and blends it with semantic similarity. Cues are plain event metadata — no schema migration of the log, only projection changes.

Feeling-of-Knowing Pre-Check

A new lightweight tool memory_feeling_of_knowing (core profile) answers "would checkout likely return something for this query?" in O(1)-ish time using only in-memory state: cue-index hit counts, entity-name bloom filters, and salience histograms — no embedding call, no graph query. Returns likely | possible | unlikely plus the signal breakdown. Honest calibration is the acceptance bar: measured against actual checkout outcomes in the benchmark lane.

Graph-Walk Retrieval (Personalized PageRank)

Retrieval gains a multi-hop stage: seed nodes from query-matched entities (vector + name match), run bounded personalized PageRank over the projected graph (Kuzu first; same algorithm against Neo4j/Postgres backends), blend PPR mass into candidate ranking. This is the HippoRAG result applied to a graph Zaxy already maintains. Behind a retrieval-profile flag until the internal benchmark lane shows lift.

Procedure Mining

capture_tool_call_completed traces feed a miner that detects successful multi-step tool sequences recurring across sessions, and proposes them as skill candidates through the existing consolidation-review pipeline (proposal events, review, acceptance). Accepted candidates become memory_skill entries with citations to the source traces. No skill becomes authoritative without review — identical to every other abstraction path.

Embedding Scale

Evaluation Plan

Each behavior change lands with a measurement lane before it changes defaults:

Public claims follow the external-validation policy: internal lanes are labeled internal; no headline numbers from unverified lanes.

Non-Goals

Risks

Ranking Opacity

Salience-modulated ranking can surprise users who expect deterministic recency ordering. Mitigation: salience contributions appear in checkout diagnostics; --retrieval-profile plain restores pre-2.2 ranking.

Attenuation of Load-Bearing Memories

A rarely-retrieved memory may still be critical (credentials policy, standing constraint). Mitigation: authority-bearing and pinned memories are exempt from the floor; memory_record_known_unknown and pinning paths are documented.

Gate False Negatives

The encoding gate may tag genuinely novel events as redundant. Mitigation: tags are metadata only and reversible by replay; the gate ships off by default and is promoted only after the forgetting lane shows no recall regression.

Profile Fragmentation

Two tool profiles risk divergent agent behavior and support burden. Mitigation: profiles share one handler table; the core profile is a strict subset; memory_capabilities documents the delta at runtime.

Cache-Ordering Staleness

Optimizing for byte-stable prefixes risks serving stale consolidated state. Mitigation: stability tiers are invalidated by the same log-signature checks as the query-page cache; staleness is impossible by construction, only reordering is new.

Benchmark Drift

New lanes must not silently replace existing public numbers. Mitigation: lanes are additive; the release gate diffs public claims against lane provenance.

Increment Plan

2.1-alpha.1: Front Door and Profiles

Scope:

Exit criteria:

2.1-alpha.2: Budgeted, Cache-Stable Checkout

Scope:

Exit criteria:

2.1-beta.1: Compaction Recovery and Doctor

Scope:

Exit criteria:

2.2-alpha.1: Salience Ledger

Scope:

Exit criteria:

2.2-alpha.2: Forgetting, Cues, and the Encoding Gate

Scope:

Exit criteria:

2.2-beta.1: Metamemory and Graph-Walk Retrieval

Scope:

Exit criteria:

2.3-alpha.1: Embedding Scale

Scope:

Exit criteria:

2.3-rc.1: Defaults and Freeze

Scope:

Exit criteria:

Acceptance Criteria

This roadmap is accepted when: