Approximate Recall Confidence Intervals
release_jiqzq76utncwpcjarp7skzf7vq
by
William Webber
2012
Abstract
Recall, the proportion of relevant documents retrieved, is an important
measure of effectiveness in information retrieval, particularly in the legal,
patent, and medical domains. Where document sets are too large for exhaustive
relevance assessment, recall can be estimated by assessing a random sample of
documents; but an indication of the reliability of this estimate is also
required. In this article, we examine several methods for estimating two-tailed
recall confidence intervals. We find that the normal approximation in current
use provides poor coverage in many circumstances, even when adjusted to correct
its inappropriate symmetry. Analytic and Bayesian methods based on the ratio of
binomials are generally more accurate, but are inaccurate on small populations.
The method we recommend derives beta-binomial posteriors on retrieved and
unretrieved yield, with fixed hyperparameters, and a Monte Carlo estimate of
the posterior distribution of recall. We demonstrate that this method gives
mean coverage at or near the nominal level, across several scenarios, while
being balanced and stable. We offer advice on sampling design, including the
allocation of assessments to the retrieved and unretrieved segments, and
compare the proposed beta-binomial with the officially reported normal
intervals for recent TREC Legal Track iterations.
In text/plain
format
Archived Files and Locations
application/pdf 527.7 kB
file_wxmj4od3qnc7beqcbqxcrs7gdu
|
arxiv.org (repository) web.archive.org (webarchive) |
1202.2880v1
access all versions, variants, and formats of this works (eg, pre-prints)