Zaxy 2.0 Beta.1 Reasoning-Loop Memory Primitives Implementation Plan

For agentic workers: REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (- [ ]) syntax for tracking.

Goal: Add production-ready, observable reasoning-loop memory primitives for planning, execution, review, and reflection without granting generated updates authority.

Architecture: Eventloom remains the source of truth. Beta.1 adds a deterministic reasoning_primitives service layer that composes existing causal, checkout, consolidation, and Skill Memory capabilities into agent-callable primitives, records every primitive call as an observation event, and appends belief updates only as review-pending proposals. Public CLI and MCP surfaces delegate to MemoryFabric so all runtime paths share validation, purpose conditioning, citations, and authority boundaries.

Tech Stack: Python 3.11+, Eventloom JSONL, existing MemoryFabric, causal graph APIs, consolidation candidates, Skill Memory graph contexts, Typer CLI, MCP Python SDK, pytest, ruff.

---

Scope Boundary

Included:

Excluded:

File Structure

Create:

Modify:

---

Task 1: Add Reasoning Primitive Contracts

Files:

Add tests proving:

from zaxy.reasoning_primitives import (
    REASONING_PHASES,
    ReasoningPrimitiveCall,
    build_belief_update_proposal_event,
    phase_purpose_profile,
)


def test_reasoning_phase_taxonomy_is_stable() -> None:
    assert REASONING_PHASES == {"planning", "execution", "review", "reflection"}


def test_phase_purpose_profiles_are_distinct() -> None:
    assert phase_purpose_profile("planning").task == "planning"
    assert phase_purpose_profile("execution").task == "execution"
    assert phase_purpose_profile("review").risk == "high"
    assert phase_purpose_profile("reflection").expected_action == "revise_or_record_learning"


def test_reasoning_call_event_is_observable_and_cited() -> None:
    call = ReasoningPrimitiveCall(
        primitive="explain_outcome",
        phase="planning",
        session_id="agent-1",
        query="Why did the test fail?",
        result_count=2,
        evidence=[{"citation": "eventloom://agent-1/events/42#aaaaaaaaaaaa", "content": "failure cause"}],
        status="succeeded",
    )

    event = call.to_event(actor="zaxy-reasoning")

    assert event["event_type"] == "reasoning.primitive.called"
    assert event["thread"] == "agent-1"
    assert event["payload"]["primitive"] == "explain_outcome"
    assert event["payload"]["phase"] == "planning"
    assert event["payload"]["evidence_count"] == 1


def test_belief_update_proposal_is_never_authoritative() -> None:
    event = build_belief_update_proposal_event(
        actor="agent",
        session_id="agent-1",
        claim="The failure was caused by a stale projection.",
        rationale="Cited causal predecessor indicates stale projection.",
        confidence=0.74,
        source_events=[{"seq": 42, "hash": "a" * 64}],
        phase="reflection",
    )

    assert event["event_type"] == "belief.update.proposed"
    assert event["thread"] == "agent-1"
    assert event["payload"]["authority_status"] == "non_authoritative"
    assert event["payload"]["review_status"] == "pending"

Run:

pytest tests/test_reasoning_primitives.py --no-cov -q

Expected: fail because zaxy.reasoning_primitives does not exist.

Create src/zaxy/reasoning_primitives.py with strict validation:

Run:

pytest tests/test_reasoning_primitives.py --no-cov -q

Expected: all Task 1 tests pass.

---

Task 2: Add MemoryFabric Reasoning Services

Files:

Add async tests proving:

Use embedded/Eventloom-only tests and fake graph query results where needed; do not require Neo4j.

Run:

pytest tests/test_reasoning_primitives.py -k "memory_fabric" --no-cov -q

Expected: fail because MemoryFabric methods do not exist.

Add methods:

async def explain_outcome(self, outcome: str, *, phase: str = "planning", session_id: str = "default", depth: int = 2) -> dict[str, Any]: ...
async def propose_belief_update(self, claim: str, *, rationale: str, confidence: float, source_events: list[dict[str, Any]], phase: str = "reflection", session_id: str = "default", actor: str = "zaxy-reasoning") -> dict[str, Any]: ...
async def get_claim_confidence(self, claim: str, *, phase: str = "review", session_id: str = "default", limit: int = 5) -> dict[str, Any]: ...
async def retrieve_similar_procedures(self, query: str, *, phase: str = "planning", session_id: str = "default", limit: int = 5) -> dict[str, Any]: ...

Implementation requirements:

Run:

pytest tests/test_reasoning_primitives.py --no-cov -q

Expected: all reasoning primitive tests pass.

---

Task 3: Add CLI and MCP Reasoning Surfaces

Files:

Add CLI help/delegation tests for:

Add MCP schema/handler/dispatch tests for:

Run:

pytest tests/test_cli.py tests/test_mcp.py -k "reasoning or explain_outcome or belief_update or claim_confidence or similar_procedures" --no-cov -q

Expected: fail because commands/tools are missing.

Add a nested memory_reasoning_app = typer.Typer(...) and wire commands through configured MemoryFabric helpers.

MCP handlers must instantiate the same configured MemoryFabric path/service pattern used by consolidation proposal/status, call the corresponding method, close the fabric safely, and return JSON.

Update docs/examples/mcp-tool-contract.json from zaxy.mcp_server.TOOLS.

Run:

pytest tests/test_cli.py tests/test_mcp.py -k "reasoning or explain_outcome or belief_update or claim_confidence or similar_procedures" --no-cov -q

Expected: all beta.1 CLI/MCP tests pass.

---

Task 4: Add Checkout Diagnostics, Docs, and Beta.1 Guardrail

Files:

Tests must prove:

Run:

pytest tests/test_causal_checkout.py tests/test_reasoning_benchmark.py -k "reasoning or belief" --no-cov -q

Expected: fail because diagnostics/guardrail are missing.

Add deterministic diagnostics:

Guardrail rows should include:

Document:

Run:

python scripts/build-site-docs.py
scripts/validate-docs.sh --root .

Run:

pytest tests/test_causal_checkout.py tests/test_reasoning_benchmark.py -k "reasoning or belief" --no-cov -q

Expected: pass.

---

Final Regression Gate

After all tasks:

pytest \
  tests/test_reasoning_primitives.py \
  tests/test_reasoning_benchmark.py \
  tests/test_causal_checkout.py \
  tests/test_cli.py \
  tests/test_mcp.py \
  -k "reasoning or belief or explain_outcome or claim_confidence or similar_procedures" \
  --no-cov -q

pytest tests/test_checkout.py tests/test_graph.py tests/test_mcp.py --no-cov -q

ruff check \
  src/zaxy/reasoning_primitives.py \
  src/zaxy/reasoning_benchmark.py \
  src/zaxy/core.py \
  src/zaxy/__main__.py \
  src/zaxy/mcp_server.py \
  src/zaxy/checkout.py \
  tests/test_reasoning_primitives.py \
  tests/test_reasoning_benchmark.py \
  tests/test_causal_checkout.py \
  tests/test_cli.py \
  tests/test_mcp.py

scripts/validate-docs.sh --root .

python -m zaxy benchmark-compare \
  reports/benchmarks/longmemeval-500-publish-20260607/live-benchmark.json \
  --backend zaxy-checkout \
  --min-mean-score 0.95 \
  --min-answer-recall-at-5 0.90 \
  --min-recall-at-5 0.99 \
  --min-citation-coverage 1.0 \
  --max-p95-ms 2500 \
  --max-p99-ms 3000

Expected:

Self-Review Notes

Spec coverage:

Known risks: