Back to Search-quality baseline: v1.2.0 candidate `search.db`
Search-quality baseline: v1.2.0 candidate `search.db`

Aggregate

MetricValue
N queries50
Right answer at rank 1 (P@1 perfect)46 / 50 (92%)
Right answer not in top 101 / 50
MRR0.9467
P@10.9200
P@50.3280
NDCG@101.7385

P@5 looks low next to MRR. Reason: the 50 queries each have exactly one canonical right answer in this design, so P@5 has a ceiling of 0.2 per query if at most one match is in the top 5. The observed 0.328 reflects queries whose right-answer regex also matches lower-ranked sibling URIs (e.g., the framework-root patterns like apple-docs://swiftui($|/[^/]*$) legitimately match many pages). The metric is correctly reported but it is not the headline number.

NDCG@10 > 1 is possible here for the same reason (multi-match patterns sum gains). Per design §8.2 this is a known accounting quirk and the metric remains useful for paired comparison, just not as an absolute on the [0,1] scale.

Headline number is MRR = 0.9467. A new ranking change has to maintain or improve this on the same 50-query corpus to claim no regression.