Back to dashboard
`docs/design/search-quality-eval.md` Phase 1 (Class A + B, paired-comparison mode)

Search-quality version diff: v1.1.0 → v1.2.0 (canonical-lookup-V2 (independent corpus))

This is a cross-validation corpus. The v1.1.0 → v1.2.0 claim ("v1.2.0 is better") is being independently re-verified with a different fixed query set so the result isn't a quirk of the first corpus. The two corpora share zero queries.

Measured 2026-05-21·Mixed

Headline result
+8 / 30 queries newly rank-1

Read in detail

Each card opens its own page. The headline and charts above are all you need at a glance; the cards are for the why and how.

Sources cited in this measurement

Every metric and method this audit relies on, with a link to the foundational source. Auto-collected from the audit text.

Mean Reciprocal Rank

Voorhees (1999), TREC-8 QA Report

Open citation

P@k (Precision at k)

Manning, Raghavan, Schütze (2008) IIR §8.4

Open citation

Wilcoxon signed-rank test

Wilcoxon (1945), Biometrics Bulletin

Open citation