Mean Reciprocal Rank
Voorhees (1999), TREC-8 QA Report
Open citationThis is a cross-validation corpus. The v1.1.0 → v1.2.0 claim ("v1.2.0 is better") is being independently re-verified with a different fixed query set so the result isn't a quirk of the first corpus. The two corpora share zero queries.
Each card opens its own page. The headline and charts above are all you need at a glance; the cards are for the why and how.
If this audit's headline + significance numbers agree directionally with search-quality-versiondiff-v1.1.0-to-v1.2.0.md's (the original 50-query corpus), the v1.2.0 > v1.1.0 claim is robust to corpus choice.
Read details →flowchart TD Q["30 canonical-lookup query strings (corpus V2)<br/>zero overlap with the original 50-query corpus"]:::input R["per-query regex pattern<br/>accepts canonical + specific-variant URIs"]:::input Q --> H[script…
Read details →Every metric and method this audit relies on, with a link to the foundational source. Auto-collected from the audit text.
Voorhees (1999), TREC-8 QA Report
Open citationManning, Raghavan, Schütze (2008) IIR §8.4
Open citationJärvelin & Kekäläinen (2002)
Open citationWilcoxon (1945), Biometrics Bulletin
Open citationMcNemar (1947), Psychometrika
Open citation