Principles for Robust Evaluation Infrastructure


Justin Zobel
Department of Computer Science and Software Engineering, The University of Melbourne, Victoria 3010, Australia.

William Webber
Department of Computer Science and Software Engineering, The University of Melbourne, Victoria 3010, Australia.

Mark Sanderson
School of Computer Science and Information Technology, RMIT University, Victoria 3001, Australia.

Alistair Moffat
Department of Computer Science and Software Engineering, The University of Melbourne, Victoria 3010, Australia.


Status

Proc. DESIRE Workshop on Data Infrastructures for Supporting Information Retrieval Evaluation, Glasgow, October 2011, pages 3-6.

Abstract

The standard "Cranfield" approach to the evaluation of information retrieval systems has been used and refined for nearly fifty years, and has been a key element in the development of large-scale retrieval systems. The resources created by such systematic evaluations have enabled thorough retrospective investigation of the strengths and limitations of particular variants of this evaluation approach; over the last few years, such investigation has for example led to identification of serious flaws in some experiments. Knowledge of these flaws can prevent their perpetuation into future work and informs the design of new experiments and infrastructures. In this position statement we briefly review some aspects of evaluation and, based on our research and observations over the last decade, outline some principles on which we believe new infrastructure should rest.

Full text

http://doi.acm.org/10.1145/2064227.2064247 .