MRR
Mean Reciprocal Rank
The position of the first right answer, averaged across queries.
For each query, take 1 / (rank of the first relevant document). Average across all queries. If the right doc is at position 1 for every query, MRR = 1.0. If always at position 2, MRR = 0.5. Used when each query has one canonical right answer — cupertino's canonical-lookup case.
Source: Voorhees, E.M. (1999). The TREC-8 Question Answering Track Report. NIST Special Publication.
Read the paper (PDF)
P@k
Precision at k
Of the top k results, what fraction is relevant.
If the top-5 returns 3 relevant docs, P@5 = 0.6. P@1 = whether the very first result is relevant. We use P@1, P@5, P@10 on the dashboards.
Source: Manning, Raghavan, Schütze (2008), §8.4 "Evaluation of ranked retrieval results."
Read the chapter
NDCG@k
Normalized Discounted Cumulative Gain
Like P@k but penalises relevant results that appear lower down.
Each relevant doc contributes a smaller "gain" the further down it appears, divided by IDCG (the gain of a perfect ranking). Used when ranking position matters and relevance can be graded. Standard in modern machine-learning ranking work.
Source: Järvelin, K., & Kekäläinen, J. (2002). Cumulated gain-based evaluation of IR techniques. ACM Transactions on Information Systems, 20(4).
DOI: 10.1145/582415.582418
MAP
Mean Average Precision
Precision averaged across positions, then averaged across queries.
For each query, compute precision at every position where a relevant document appears, then average. Then average across queries. Known for "especially good discrimination and stability" (Manning IIR §8.4). The default headline metric at TREC.
Source: Manning, Raghavan, Schütze (2008), §8.4.
Read the chapter
R-Precision
R-Precision
P@k where k equals the number of known relevant docs.
Adjusts for queries that have widely-different numbers of relevant documents. Cleanest when the relevant-set size is known up front; used for prose / conceptual queries where multiple documents jointly answer the question.
Source: Manning, Raghavan, Schütze (2008), §8.4.
Read the chapter