Score Standardization for Inter-Collection Comparison of Retrieval Systems
William Webber
Department of Computer Science and Software Engineering,
The University of Melbourne,
Victoria 3010, Australia.
Alistair Moffat
Department of Computer Science and Software Engineering,
The University of Melbourne,
Victoria 3010, Australia.
Justin Zobel
NICTA Victoria Laboratory,
Department of Computer Science and Software Engineering,
The University of Melbourne,
Victoria 3010, Australia.
Status
Proc. 31st Annual International ACM SIGIR Conference on
Research and Development in Information Retrieval, Singapore,
July 2008, pages 51-58.
Abstract
The goal of system evaluation in information retrieval has always
been to determine which of a set of systems is superior on a given
collection.
The tool used to determine system ordering is an evaluation metric
such as average precision, which computes relative,
collection-specific scores.
We argue that a broader goal is achievable.
In this paper we demonstrate that, by use of standardization, scores
can be substantially independent of a particular collection, allowing
systems to be compared even when they have been tested on different
collections.
Compared to current methods, our techniques provide richer
information about system performance, improved clarity in outcome
reporting, and greater simplicity in reviewing results from disparate
sources.
Software
Web site
giving standardization constants for typical TREC
experiments
Full text
http://doi.acm.org/10.1145/1390334.1390346
.