P@k (Precision at k)
Manning, Raghavan, Schütze (2008) IIR §8.4
Open citationThis audit tests multi-word natural-language queries — the kind a developer or AI agent would actually issue in a coding session, like "how to make a type Sendable" or "actor reentrancy semantics" — that have no single canonical URI. The right answer is a small SET of relevant documents spread across apple-docs and swift-evolution.
Each card opens its own page. The headline and charts above are all you need at a glance; the cards are for the why and how.
This is the second-worst class baseline after acronym (4/22, 18%). For prose, this number represents an upper-bound on the methodology problem and a lower-bound on the ranker problem; the truth is somewhere in between.
Read details →The harness defined per-query "valid" URIs as a tight enumeration: for "actor reentrancy semantics", valid = apple-docs://swift/actor OR swift-evolution://SE-0306 OR swift-evolution://SE-0327.
Read details →Several of the "misses" are arguably correct results that the regex rejected:
Read details →Cupertino's BM25F + RRF configuration is tuned for canonical-lookup and symbol-identifier queries (classes A, D), where it excels (92% P@1, 100% rank-1 fragment recall).
Read details →Following the feedback_code_changes_as_ideas_for_future rule:
Read details →An AI agent issuing a prose query like "how to make a type Sendable" gets appentity and systemcoordinator at top-3, NOT the Sendable protocol page.
Read details →15 prose queries, each with a regex matching a per-query enumerated valid-URI set. For each: run cupertino search "<query>" --limit 10, find first-relevant rank, compute P@3, P@5, any-match-in-top-3 (binary).
Read details →Six of eight Phase 1.x classes from §1.4 now have documented baselines. One remains: H (symbol-attribute). Plus Phase 1.7 (anti-hallucination agent-end-to-end).
Read details →Every metric and method this audit relies on, with a link to the foundational source. Auto-collected from the audit text.
Manning, Raghavan, Schütze (2008) IIR §8.4
Open citationCormack, Clarke, Büttcher (2009), SIGIR
Open citationVoorhees (1999), TREC-8 QA Report
Open citation