Score Estimation, Incomplete Judgments, and Significance Testing in IR Evaluation


Sri Devi Ravana
Department of Computer Science and Software Engineering, The University of Melbourne, Victoria 3010, Australia.

Alistair Moffat
Department of Computer Science and Software Engineering, The University of Melbourne, Victoria 3010, Australia.


Status

Proc. 2010 Asian Information Retrieval Societies Conference, Taipei, Taiwan, December 2010, pages 97-109, LNCS volume 6458.

Abstract

Comparative evaluations of information retrieval systems are often carried out using standard test corpora, and the sample topics and pre-computed relevance judgments that are associated with them. To keep experimental costs under control, partial relevance judgments are used rather than exhaustive ones, admitting a degree of uncertainty into the per-topic effectiveness scores being compared. Here we explore the design options that must be considered when planning such an experimental evaluation, with emphasis on how effectiveness scores are inferred from partial information.

Published paper