Score Standardization for Inter-Collection Comparison of Retrieval Systems


William Webber
Department of Computer Science and Software Engineering, The University of Melbourne, Victoria 3010, Australia.

Alistair Moffat
Department of Computer Science and Software Engineering, The University of Melbourne, Victoria 3010, Australia.

Justin Zobel
NICTA Victoria Laboratory, Department of Computer Science and Software Engineering, The University of Melbourne, Victoria 3010, Australia.


Status

Proc. 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Singapore, July 2008, pages 51-58.

Abstract

The goal of system evaluation in information retrieval has always been to determine which of a set of systems is superior on a given collection. The tool used to determine system ordering is an evaluation metric such as average precision, which computes relative, collection-specific scores. We argue that a broader goal is achievable. In this paper we demonstrate that, by use of standardization, scores can be substantially independent of a particular collection, allowing systems to be compared even when they have been tested on different collections. Compared to current methods, our techniques provide richer information about system performance, improved clarity in outcome reporting, and greater simplicity in reviewing results from disparate sources.

Software

Web site giving standardization constants for typical TREC experiments

Full text

http://doi.acm.org/10.1145/1390334.1390346 .