Back to Search-quality baseline: prose / conceptual (Phase 1.5, v1.2.0 candidate)
Search-quality baseline: prose / conceptual (Phase 1.5, v1.2.0 candidate)

Reading the Misses Honestly

Several of the "misses" are arguably correct results that the regex rejected:

  • Swift macros returned appkit/macros, driverkit/driverkit-macros, widgetkit/preview-macros. These ARE macro-related pages; a human reading "Swift macros" might want the language-feature SE proposals, but the ranker reasonably surfaced framework-specific macro documentation. The regex required SE-0382/0389/0397 and saw none.
  • NavigationStack programmatic navigation returned understanding-the-composition-of-navigation-stack and swiftui/navigation. Both are highly relevant to the question; the regex was looking for the literal navigationstack/navigationpath URIs and didn't match these.
  • Result Builders proposal returned three SE proposals (SE-0373, SE-0326, SE-0348). None are the expected SE-0289 but the ranker plausibly returned Swift evolution proposals on a Swift evolution topic; a human would say "wrong proposal, but right direction."
  • Combine to async/await migration returned SE-0463 and SE-0458. Both might be migration-related; the regex was strict.

Other misses are real ranker failures:

  • how to make a type Sendable returned appentity, systemcoordinator, storekit-views — none of which are relevant. The Sendable protocol page is rank 8.
  • MainActor isolation returned SwiftUI style protocols (tabviewstyle, controlgroupstyle, toolbarcontent) — wholly unrelated. The MainActor docs are not in top-3.
  • error handling Swift async returned three deep manager-method pages — also unrelated to the conceptual question.

A rough human-judgment split: of the 11 misses, perhaps 4-6 are arguably-acceptable (ranker found a defensible page, regex too strict), and 5-7 are genuine ranker misses (top-3 has nothing useful for the question).

Adjusted estimate of "useful in top-3" is therefore 8-10 out of 15 (53-67%), not 26.7%. Still not great, but materially better than the regex-strict number suggests.

This is exactly the situation the design's §14.3 (Phase 2 TREC-grade pooling) was designed for. Honest measurement of prose-query quality requires human judges.