Zaxy 2.3 Backend Continuity Options — Research Report

---

1. LadybugDB fork — in depth

1.1 Identity, history, governance

1.2 Releases and wheels

1.3 API compatibility posture

1.4 The defect ledger — fix evidence, one by one

All "locally verified" rows were run against ladybug==0.17.1 (scripts in /tmp/zaxy-23-research/), each case in an isolated subprocess so a segfault would be caught as a signal exit.

# Frozen-Kuzu defect Status in ladybug 0.17.1 Evidence
1 kuzu#5965: SET on indexed column rejected NOT fixed (still rejected, cleanly) Locally verified: RuntimeError: Cannot set property vec in table embeddings because it is used in one or more indexes. Open feature request: <https://github.com/LadybugDB/ladybug/issues/377>. (Cosmetic bug: error message names the wrong table.) Zaxy already builds indexes after bulk load, so this is non-blocking — and the failure mode is an exception, not corruption.
2 kuzu#6040: DROP_VECTOR_INDEX metadata corruption FIXED Fix commit 029d7aef25 (2025-10-28) "fix: DROP_VECTOR_INDEX persistence bug preventing property updates (#22)" <https://github.com/LadybugDB/ladybug/commit/029d7aef25>; tracking issue closed 2025-11-12: <https://github.com/LadybugDB/ladybug/issues/69>. Locally verified: create index → DROP_VECTOR_INDEXCHECKPOINT → close → reopen → SET on the formerly-indexed column succeeds. Zaxy's drop-free generation-swap could be retired under ladybug (but see §1.5 before doing so).
3 kuzu#6047: sequential HNSW load can block DB open No direct evidence of a fix No matching issue/commit found in the fork (searched "HNSW load", "blocking database open", "6047"). Adjacent but distinct work exists (non-blocking concurrent checkpoint, PRs #332/#371, v0.15.4.2 notes). Treat as inherited until the Zaxy lane measures cold-start at 10^5. Zaxy's threshold-gated cold-start guard stays.
4 In-memory Arrow COPY FROM fixed-size-list segfault FIXED / not reproducible Locally verified: COPY V FROM $tbl where $tbl is an in-memory pyarrow.Table with a pa.list_(pa.float32(), 4) column succeeded (exit 0, rows queryable) — the exact shape that segfaults kuzu 0.11.3 per the 2.2 E1 finding. No single fix commit identified; related closed fixes: param-binding FLOAT[N] segfault <https://github.com/LadybugDB/ladybug/issues/376> (closed 2026-04-12) and substantial Arrow-path work in 0.15–0.17. The parquet round-trip could be removed under ladybug after lane confirmation.
5 Unbound $parameter segfault Crash fixed; semantics now silent Locally verified: an unbound $missing no longer segfaults — it evaluates to NULL and the query runs "successfully". That converts a crash into a silent-wrong-answer hazard. Zaxy's _QUERY_PARAMETER_RE choke point must stay regardless of backend.
6 Silent search breakage after in-place mutation under live HNSW Substantially fixed; one residual hole Fix lineage: issue #67 "vector index: poor recall after many deletions" (closed 2025-11-13) <https://github.com/LadybugDB/ladybug/issues/67>, fixed by PR #68 "vector index: fix checkpoint corruption" (root cause: OverflowFile::checkpoint() page allocation) <https://github.com/LadybugDB/ladybug/issues/68>. Locally verified (positive): delete + reinsert of all 100 rows one-by-one under a live index, then QUERY_VECTOR_INDEX → 5/5 overlap with exact ground truth. Locally verified (residual): MATCH (v:V) DELETE v (delete-ALL in one statement) then reinsert → the index permanently returns 0 rows on 0.17.1. Zaxy never does delete-all under a live index on the active generation (deltas are pure extensions; rebuilds are fresh generations), so this is avoided by existing design — but it means "empty superseded generation tables" remains the right pattern even though dropping is now safe.

Bonus row — the defect that motivated 2.3 in the first place:

| The G3 failure (HNSW recall collapse at high dim / large N) | NOT addressed — by the maintainer's own statement | "There have been no significant changes to the vector indexing code since [the fork]" — adsharma, 2026-03-31, in <https://github.com/LadybugDB/ladybug/issues/351> (that issue's ~18%-recall report was traced to a downstream Rust-crate packaging error, not core HNSW). Ladybug ships the same NaviX HNSW that Zaxy's lane measured at 0.60–0.63 recall@10 at 50k×1536. Switching backends does not buy back high-dimension ANN; the numpy paths stay primary. |

1.5 Counter-signals (be blunt)

1.6 Migration blast radius for Zaxy (qualitative, locally grounded)

Small. import kuzu appears in exactly two files (src/zaxy/embedded_graph_store.py:241, src/zaxy/doctor.py:362, both lazy imports — verified); the pin is kuzu>=0.11.0 in pyproject.toml. ~51 execute() call sites in the store run a dialect that executed unmodified in my tests. No file-format migration: projections rebuild. The 2.2 defect accommodations (parquet round-trip, generation swap, param choke point, cold-start guard) can all be kept under ladybug at zero cost — retire them only as the lane proves each one unnecessary.

---

2. Other Kuzu forks and continuations

Fork Status Verdict
ryugraph (predictable-labs) <https://github.com/predictable-labs/ryugraph> 137 stars; last push 2026-01-20; PyPI ryugraph 25.9.2 last published 2025-12-06 (verified) Stalled ~5 months. Early energy (the LadybugDB DROP_VECTOR_INDEX fix arrived "via ryugraph", per issue #69), now apparently abandoned. Not a candidate.
Vela-Engineering/kuzu <https://github.com/Vela-Engineering/kuzu> Active (pushed 2026-06-07); tagged v0.12.0-vela.* releases (2026-06-07); claims concurrent multi-writer support; MIT Interesting engineering, but no PyPI distribution (verified: no vela-kuzu/kuzu-vela packages), 37 stars, single-company fork for their own agent-memory product. Watch, don't adopt.
Bighorn (Kineviz) <https://github.com/Kineviz/bighorn> 130 stars; last push 2025-10-11 — the day after the archive; not on PyPI (verified) Dead on arrival.

LadybugDB is the only fork with releases, wheels, users, and a defect-fix record. (Fork landscape context: <https://gdotv.com/blog/weekly-edge-kuzu-forks-duckdb-graph-cypher-24-october-2025/>.)

---

3. Alternative embedded backends (scoped to Zaxy's profile)

Profile assumed: property-graph entities with temporal validity windows, typed relationships, traversal queries, exact + vector search — where 2.2 proved the numpy paths beat Kuzu HNSW at every measured scale, so the vector workload does not need to live in the graph engine.

3.1 DuckDB (+ property-graph approaches)

3.2 SQLite (stdlib sqlite3, JSON + recursive CTEs as graph substrate)

3.3 LanceDB / lance-graph

3.4 Shrink the ask (Zaxy-owned projection store: SQLite/parquet + existing numpy vectors)

---

4. Risk timeline: staying on frozen Kuzu 0.11.3

---

5. Decision-ready comparison and recommendation

Option Continuity Defect posture Vector fit Blast radius Existential risk
Stay-and-contain (kuzu 0.11.3) None; expires ~Oct 2026 (3.15), already broken for macOS py3.14 All 6 defects permanent (all contained by 2.2 engineering) numpy primary (proven) Zero High and rising
Adopt LadybugDB Monthly releases; broader wheel matrix incl. cp314 all-platform #6040 fixed, Arrow-COPY fixed, param segfault fixed(→NULL), mutation breakage fixed (delete-all residual), #5965 unchanged, #6047 unproven Same NaviX HNSW — G3 unchanged; numpy stays primary Small (2 import sites, pin change, projection rebuild; dialect ran unmodified) Bus factor ≈ 1
Second backend: SQLite (shrink the ask) Effectively infinite Entire defect class eliminated numpy primary (unchanged) Large (new store impl) Lowest possible
DuckDB (+DuckPGQ) DuckDB excellent; DuckPGQ research-grade n/a numpy primary Large (SQL rewrite) + weak link in dialect Medium (DuckPGQ)
LanceDB / lance-graph LanceDB healthy; lance-graph embryonic n/a Redundant with numpy Large + graph gap High (immaturity)

Recommendation (ranked)

  1. Adopt LadybugDB as the 2.3 default backend — confidence: medium-high. It is the only continuation with releases, full-matrix wheels (incl. the cp314 macOS/Windows gap that frozen Kuzu will never close), a verified drop-in API, a working EXPORT→IMPORT migration path, and commit-level fixes for three of the six documented defects (plus a crash→exception downgrade on a fourth) — all re-verified locally against 0.17.1 in this audit. Conditions that should be non-negotiable: exact-version pin (ladybug==, never the stale real-ladybug name); keep every 2.2 defect accommodation (param choke point — unbound params now silently NULL; generation-swap empties; cold-start guard — #6047 unproven) and retire each only on lane evidence; re-run the full vector-scale + graph-scale lanes on ladybug as the formal acceptance bar (the open #452 write-throughput report at exactly Zaxy-relevant scale is the thing the lane must clear); numpy paths remain the primary vector machinery — the fork does not change NaviX and does not reopen G3. The bus-factor-of-one is real and is the reason for item 2.
  1. Start the shrink-the-ask design study in parallel (2.3 research, 2.4+ delivery) — confidence: high that it removes the risk class; medium on cost. Ladybug fixes the continuity emergency, not the structural one (Zaxy's default path depends on whether one person keeps merging). A Zaxy-owned SQLite/parquet projection store + the existing numpy vector machinery is the only option whose maintenance risk goes to ~zero, and 2.2's strategy-pipeline consolidation already shrank the surface a second backend must implement. The pending Cypher-surface audit decides feasibility; if the traversal surface is shallow, this should become the long-term default.
  1. Stay-and-contain — acceptable only as the fallback if the ladybug lane runs fail — confidence: high in the assessment. It works today because 2.2 engineered around everything, but it has a dated expiry: no macOS/Windows wheels on Python 3.14 now, no Python 3.15 path in ~4 months, and a frozen vendored-dependency tree with known (since-patched-in-ladybug) vulnerabilities. If the lane rejects ladybug, pin hard, add an upper Python bound, and accelerate option 2.
  1. DuckDB / LanceDB as graph substrate — not recommended for 2.3 — confidence: high. DuckDB itself is the healthiest dependency in this report, but the property-graph story (DuckPGQ) is research-grade with no releases and a 3-month-old last commit; lance-graph is eight months old. Either would trade a verified small migration for a large rewrite onto a weaker link. Reassess lance-graph at 2.4+.