Benchmark Contributions

Benchmark contributions are welcome when they improve reproducibility, coverage, or release confidence. Zaxy benchmark evidence must be replayable from tracked inputs and must not rely on private local state.

Accepted Contribution Types

Required Evidence

Every benchmark report proposed for release use needs:

Reports under reports/backend-shootout/ are checked by scripts/check-backend-shootout.py. Release reports must keep query diagnostics, fingerprint validation, git-tracked inputs, citation coverage, and latency budgets enabled. Do not promote reports/benchmarks/*-diagnostics.* files as public claims unless the corresponding workload and report contract are also tracked and documented.

Running Checks

Use the narrow command for the artifact you changed, then run the broader release checks when the result supports a public claim:

python scripts/check-backend-shootout.py reports/backend-shootout/backend-shootout.json \
  --require-report-metadata \
  --require-markdown-report \
  --require-query-results \
  --require-git-tracked-inputs \
  --verify-report-fingerprints

zaxy doctor --beta-readiness
scripts/release-check.sh --root .

For Coordinate benchmarks, regenerate from a tracked workload and keep the report limitations visible. For LongMemEval-compatible reports, preserve the same-harness BM25 comparison and cite whether hosted embeddings or caches were used.

Review Criteria

Reviewers should reject benchmark contributions when:

Accepted benchmark changes should update benchmarks.md, testing.md, the changelog, and README.md when the public benchmark story changes. Operational release steps live in runbook.md.