EvaluatIR: an online tool for evaluating and comparing IR systems

Timothy G. Armstrong
Department of Computer Science and Software Engineering, The University of Melbourne, Victoria 3010, Australia.

Alistair Moffat
Department of Computer Science and Software Engineering, The University of Melbourne, Victoria 3010, Australia.

William Webber
Department of Computer Science and Software Engineering, The University of Melbourne, Victoria 3010, Australia.

Justin Zobel
Department of Computer Science and Software Engineering, The University of Melbourne, Victoria 3010, Australia.

Status

Proc. 32nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Boston, July 2009, page 833. Demonstration presentation.

Abstract

A fundamental goal of information retrieval research is to develop new retrieval techniques, and to demonstrate that they attain improved effectiveness compared to their predecessors. To quantitatively compare IR techniques, the community has developed a range of standard corpora of documents, queries, and relevance judgements. We have developed a centralized mechanism for authenticating new similarity techniques that have been tested on a standard corpus. Our website provides an independent, permanent, certified measure of effectiveness that can be relied on by both authors and subsequent readers. Researchers seeking a comparison upload runs via the browser-based interface, and the website returns a link to a page with performance results and statistical comparisons to baselines, using measures such as MAP, nDCG, and RBP, and techniques such as longitudinal standardization. By comparing against standard baselines and up-to-date runs submitted by others, researchers can determine whether their methods provide a true improvement over earlier work, and readers and referees can more easily assess claimed results.

Software

EvaluatIR web site.

Full text

http://doi.acm.org/10.1145/1571941.1572153 .