Event-sourced memory for agent work

Zaxy

Zaxy turns agent work into durable, auditable memory: a hash-chained append-only log as the source of truth, cited Memory Checkout as the trust contract, salience-based forgetting that attenuates instead of deleting, compaction recovery for long sessions, and coordinated worker missions that merge back into one replayable project history.

runtimeembedded LadybugDB default
sourceEventloom append-only JSONL
checkoutcited, token-budgeted context
forgettingsalience attenuation, replayable
PyPI 2.3.1 47 MCP tools, 8-tool core profile cognitive ranking default embedded local runtime Harvey LAB 10/10 tasks Harvey LAB mean 0.788 Headline 500 R@5 1.000 Harvey LAB external signal headline 500 checkout evidence external verification requested

Cognitive memory

Memory that forgets like a human, with receipts.

Zaxy 2.x implements the memory literature as boring, testable scoring over an immutable log: retrieval strengthens, disuse decays, surprise gates encoding. Forgetting is pure projection policy — attenuated memories leave default ranking but stay one explicit query away, with a replayable record of why they faded. Every default in this list was flipped or held by an evaluation lane that ships in the repo.

salience ranking

Confirmed memories outrank stale ones; invalidated memories attenuate below a floor. Pinned and authority-reviewed memories are exempt. Cold-start ranking is byte-identical to plain retrieval — measured, not asserted.

compaction recovery

After a context compaction, a session-resumed hook hands the agent back its open tasks, accepted findings, and known unknowns — every line citing the exact log events it came from.

memory_checkout(..., max_tokens=N)

Token-budgeted packing with explicit elision reporting, and cache-stable packet ordering: consolidated content renders byte-identically across calls so provider prompt caching hits.

tool profiles

The default core profile lists 8 tools instead of 47 — an 83.5% smaller listing surface — while every tool stays callable by name. --profile full restores the complete listing.

memory_feeling_of_knowing

Experimental sub-millisecond metamemory pre-check: would checkout likely return something? Lets agents decide whether retrieval is worth the call before paying for it.

zaxy memory mine-procedures

Recurring successful tool sequences become review-pending procedure candidates with trace citations. Nothing becomes authoritative without review — the same gate as every generated abstraction.

Coordinate

Worker-local claims are not project truth.

Spawning agents is easy. The hard part is turning isolated investigations into one trustworthy state of work. Zaxy records each worker in its own Eventloom session, reviews findings with evidence, marks stale and conflicting claims, and promotes only accepted facts into the parent mission.

Parent mission

The coordinator owns accepted project history, decisions, handoff, and Memory Checkout state.

Worker sessions

Agents investigate in isolated logs, so exploration does not contaminate authoritative memory.

Approval packets

Human or coordinator-agent review accepts, rejects, defers, or promotes findings with cited provenance.

Architecture

Eventloom is truth. The graph is a rebuildable projection.

Missionobjective and parent state
Workersisolated Eventloom sessions
Findingsevidence, confidence, citations
Reviewconflicts, stale claims, approvals
Checkoutaccepted cited prompt state

Eventloom source of truth remains the append-only project record. The default local runtime is embedded LadybugDB, launched and cleaned by zaxy init and zaxy doctor. Neo4j remains the sidecar control backend; pgGraph, LatticeDB, and Pathlight are advanced integration tracks for teams that need alternate deployment or observability posture.

Purpose control plane

The same evidence can mean different memory for different work.

Zaxy carries purpose-conditioned checkout through retrieval, diagnostics, feedback, compaction, and Coordinate accepted state. This is still framed as project-local agent work memory, not a broad Company Brain claim.

memory_checkout(..., purpose="coding")

Applies deterministic purpose emphasis, recall floors, scoring profile selection, and checkout guidance.

zaxy memory purpose status

Replays active profile, checkout quality, accepted Coordinate state, and feedback posture without graph mutation.

zaxy memory purpose lanes

Shows purpose-specific checkout lanes, cited source groups, and suppression candidates.

zaxy memory purpose feedback

Surfaces positive and negative outcome history so future retrieval can prioritize useful purpose-specific memory.

Interfaces

CLI, MCP, dashboard, and adapters share the same contracts.

coordination_checkout accepted parent state plus diagnostic worker-local findings coordination_approval_packet reviewable accept/reject/defer/promote payloads memory_checkout answerability, current_citation_count, salience diagnostics, budget reporting, and memory_feedback guidance memory_feeling_of_knowing experimental likely/possible/unlikely pre-check with calibration markers CoordinationAdapter dependency-light Python wrapper with LangGraph and CrewAI helper paths dashboard --enable-coordinate-review opt-in human review controls over replay-backed state; read-only remains the default

Benchmark evidence

Public claims stay inside the evidence boundary.

Current public benchmark evidence is intentionally narrow: the headline 500-question LongMemEval-compatible checkout diagnostic and the Harvey LAB external legal-agent memory-ablation report. Older backend shootouts, partial slices, suite gates, and debug reports are archived as development history rather than current claims.

Harvey LAB

0.788 mean criterion pass rate

Full ten-task external legal-agent memory-ablation run, +0.184 versus regular/no-memory and 9/10 task wins versus article-best rows.

Headline 500

R@5 1.000 with citations 1.000

Full 500-question LongMemEval-compatible checkout diagnostic: mean 0.956, Answer@5 0.910, Recall@5 1.000.

Claim boundary

Checkout diagnostic, not official LME

The headline 500 is a Zaxy same-harness checkout run, not an official LongMemEval end-to-end assistant score.

Comparison posture

Evidence first, claims second

Archived reports remain useful for engineering history, but public benchmark claims now route through the benchmark hub.

Vector search (internal)

Defaults bounded to the measured envelope

The 2.2 ANN engagement rule ships exactly as far as internal lane evidence extends: HNSW engages at 100k+ vectors up to 64 dimensions, where it measured recall 1.0 with 12.9x faster index builds. Above that envelope, exact search remains the recommendation — the negative results ship in the same paper.

Current Evidence Boundary

These rows are release evidence and disclosure status, not a universal memory leaderboard.

Artifact Status What it supports What it does not support
Harvey LAB external memory-ablation complete Full 10-task legal-agent benchmark evidence: 0.788 mean criterion pass rate, +0.184 vs regular/no-memory, +0.081 vs article-best rows, 9/10 task wins. Same-harness full-suite scores for non-Zaxy systems beyond the article-published matrix.
LongMemEval-compatible checkout 500 current headline Same-harness checkout diagnostic: mean 0.956, Answer@5 0.910, Recall@5 1.000, citation coverage 1.000. Official LongMemEval end-to-end assistant accuracy or external memory-system leaderboard ranking.

Install

Initialize a local embedded runtime, then expose memory through MCP.

pipx install zaxy-memory
zaxy init
zaxy memory log --eventloom-path .eventloom --limit 5
zaxy memory bootstrap --eventloom-path .eventloom
zaxy doctor --eventloom-path .eventloom
zaxy coordinate start "ship auth refactor" --mission auth-main
zaxy coordinate worker create --mission auth-main --worker auth-api
zaxy coordinate assign --mission auth-main --worker auth-api "trace failures"
zaxy coordinate brief --mission auth-main
zaxy coordinate checkout --mission auth-main

What happens when you run init

Zaxy writes `.env.local`, records session genesis and heartbeat, checks graph posture, and prints the MCP command or config path.

What stays local

Session history lives in .eventloom/ as append-only JSONL. The graph is a rebuildable projection.

How to prove capture

memory log, memory bootstrap, doctor, and hook-status expose Last checkout, capture, and stale-memory posture.

Documentation

Start with Coordinate and purpose. Keep the rest as operator reference.

Operator and internals reference