Strategic System Comparisons via Targeted Relevance Judgments
Alistair Moffat
Department of Computer Science and Software Engineering,
The University of Melbourne,
Victoria 3010, Australia.
William Webber
Department of Computer Science and Software Engineering,
The University of Melbourne,
Victoria 3010, Australia.
Justin Zobel
School of Computer Science and Information Technology,
RMIT University,
Victoria 3001, Australia.
Status
Proc. 30th Annual International ACM SIGIR Conference on
Research and Development in Information Retrieval, Amsterdam,
July 2007, pages 375-382.
Abstract
Relevance judgments are used to compare text retrieval systems.
Given a collection of documents and queries, and a set of systems
being compared, a standard approach to forming judgments is to
manually examine all documents that are highly ranked by any of the
systems.
However, not all of these relevance judgments provide the same
benefit to the final result, particularly if the aim is to identify
which systems are best, rather than to fully order them.
In this paper we propose new experimental methodologies that can
significantly reduce the volume of judgments required in system
comparisons.
Using rank-biased precision, a recently proposed effectiveness
measure, we show that judging around 200 documents for each of 50
queries in a TREC-scale system evaluation containing over 100 runs
is sufficient to identify the best systems.
Full text
http://doi.acm.org/10.1145/1277741.1277806.