Back to Search-quality baseline: cross-source canonical (Phase 1.2, v1.2.0 candidate)
Search-quality baseline: cross-source canonical (Phase 1.2, v1.2.0 candidate)

Aggregate

MetricValue
Queries total25
Top-1 matches expected19 / 25 (76.0%)
Expected source IS present in top-1019 / 25
Top-1 match conditional on expected being present19 / 19 (100%)
Top-1 mismatch conditional on expected being present0 / 19

Binomial test on the strict subset (expected present in top-10): k = 19 of n = 19 trials, one-sided p (top-1 matches expected vs chance 0.5) = 1.9 × 10⁻⁶. The null hypothesis (no source-weight effect) is rejected emphatically.

The 6 mismatches are not "ranker chose wrong source from candidates"; they are "expected source had nothing in the top-10 at all because apple-docs's source weight outcompetes the lower-weight sources." A deeper finding (see below).