Strategic System Comparisons via Targeted Relevance Judgments


Alistair Moffat
Department of Computer Science and Software Engineering, The University of Melbourne, Victoria 3010, Australia.

William Webber
Department of Computer Science and Software Engineering, The University of Melbourne, Victoria 3010, Australia.

Justin Zobel
School of Computer Science and Information Technology, RMIT University, Victoria 3001, Australia.


Status

Proc. 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Amsterdam, July 2007, pages 375-382.

Abstract

Relevance judgments are used to compare text retrieval systems. Given a collection of documents and queries, and a set of systems being compared, a standard approach to forming judgments is to manually examine all documents that are highly ranked by any of the systems. However, not all of these relevance judgments provide the same benefit to the final result, particularly if the aim is to identify which systems are best, rather than to fully order them. In this paper we propose new experimental methodologies that can significantly reduce the volume of judgments required in system comparisons. Using rank-biased precision, a recently proposed effectiveness measure, we show that judging around 200 documents for each of 50 queries in a TREC-scale system evaluation containing over 100 runs is sufficient to identify the best systems.

Full text

http://doi.acm.org/10.1145/1277741.1277806.