Zaxy 2.3 Release-Readiness Research

Synthesis of four parallel research tracks completed 2026-06-11. Full findings with evidence: docs/research/artifacts/2-3-research/ (backend-options.md, kuzu-usage-audit.md, promotion-pass.md, external-validation.md, int8 benchmark JSON).

The shape of 2.3

2.3 is a continuity-and-credibility release, not a feature release: replace the frozen storage engine before its countdown expires, repair the external-reproduction story before inviting anyone to use it, and run the promotion pass where evidence (not assertion) says a default should move.

Track 1 — Backend continuity: adopt LadybugDB, design the exit

The countdown is already running. Frozen kuzu 0.11.3 has no macOS or Windows cp314 wheels today, and Python 3.15 (Oct 2026) is a hard cliff — no compatible wheels will ever exist.

The engine is barely load-bearing. On a warm cache under default config, effectively zero request-time retrieval work happens in Kuzu: 2.2 moved everything (traversal BFS, PageRank, BM25, vector selection + rerank) to cached Python/numpy. What remains: durable bitemporal storage, per-event writes, cold-start hydration, one uncached per-call scan (search_causal_neighbors — repair candidate), and maintenance surfaces. Every MATCH in the repo is depth ≤ 1.

LadybugDB is viable, locally verified. Active fork (monthly releases, cp310–cp314 wheels on all platforms, MIT); three of our six documented defects are fixed and verified by running our own reproductions against 0.17.1 (DROP_VECTOR_INDEX corruption, Arrow-COPY segfault, crash-on-unbound parameter — now a silent NULL, so the choke point stays). SET-on-indexed remains rejected; the vector index is unchanged since the fork (numpy stays primary regardless). Storage files are incompatible but projections rebuild by design — blast radius is two lazy import sites plus an exact pin (ladybug==, NOT the stale real-ladybug name). Open counter-signal the acceptance lane must clear: reported write-throughput collapse at ~60k nodes on 0.16.x (LadybugDB#452).

Bus factor ≈ 1 (one maintainer cut every release). Therefore the decision is two-stage: LadybugDB as the 2.3 bridge, with a parallel design study for a Zaxy-owned SQLite + numpy store (~1.2k LOC changed, ~600 deleted, kills the C++-wheel risk class permanently) as the structural destination. A new full graph backend is dominated — nothing left uses a graph-database-distinguishing capability.

Prerequisite artifact for any path: the conformance suite. Today there is no backend-parametrized behavioral parity suite; embedded-only extras degrade via silent getattr; the protocol has drifted in three places. The suite converts every option's risk into mechanical verification.

Track 2 — Promotion pass: one promotion, one demotion, one instrumentation

Track 3 — External validation: fix before inviting

Proposed increment order

  1. 2.3-a: conformance suite + P0 reproduction fixes (lane entrypoint, Harvey runner, artifact regeneration with provenance) + FoK auto-emit instrumentation (starts the data clock) + the two free repairs (causal-neighbors caching, int8 block-wise).
  2. 2.3-b: LadybugDB adoption behind the conformance suite + full lane rerun acceptance (incl. the #452 write-throughput check); encoding-gate lane built and run.
  3. 2.3-c: promotion decisions on lane evidence (gate flip or hold; float32 selection mode); Harvey author invitation with the fixed kit; SQLite-store design study published for 2.4+.
  4. FoK calibration join lands whenever its data matures (~2 weeks after 2.3-a), independent of the release train.

Readiness verdict

Research complete. No unknowns block planning; every 2.3 decision now has either the evidence in hand or a specified instrument to produce it. The two highest-risk discoveries (Harvey kit breakage, FoK zero-data) were both silent failures that would have surfaced at the worst possible moments — found at research cost instead.