Score Estimation, Incomplete Judgments, and
Significance Testing in IR Evaluation
Sri Devi Ravana
Department of Computer Science and Software Engineering,
The University of Melbourne,
Victoria 3010, Australia.
Alistair Moffat
Department of Computer Science and Software Engineering,
The University of Melbourne,
Victoria 3010, Australia.
Status
Proc. 2010 Asian Information Retrieval Societies Conference,
Taipei, Taiwan, December 2010, pages 97-109,
LNCS volume 6458.
Abstract
Comparative evaluations of information retrieval systems are often
carried out using standard test corpora, and the sample topics and
pre-computed relevance judgments that are associated with them.
To keep experimental costs under control, partial relevance
judgments are used rather than exhaustive ones, admitting a degree
of uncertainty into the per-topic effectiveness scores being
compared.
Here we explore the design options that must be considered when
planning such an experimental evaluation, with emphasis on how
effectiveness scores are inferred from partial information.