What Would We Like IR Metrics to Measure?

Alistair Moffat
Department of Computing and Information Systems, The University of Melbourne, Victoria 3010, Australia.


Invited presentation, Proc. 12th NTCIR Conference on Evaluation of Information Access Technologies, Tokyo, June 2016.


The field of Information Retrieval has a long-standing tradition of rigorous evaluation, and an expectation that proposals for new mechanisms and techniques will either be evaluated in batch-mode experiments against realistic test collections, with results reported derived from standard tools; or will be evaluated through the use of user studies. This emphasis on evidence, and the desire for verification of proposals, has meant that IR effectiveness measurement is an important area studied in its own right. The result has been the development of a complex suite of relevance metrics, each of them with seemingly different behavior. Well-known examples include Precision, Recall, Average Precision, Normalized Discounted Cumulative Gain, BPref, the Q-Measure, Rank-Biased Precision (RBP), and so on. In this presentation the underlying question of what it is that a metric should measure is returned to, with a set of desiderata for usefulness used as a starting point for examining the existing palette of metrics. Recent work that has described a goal-sensitive adaptive metric called INST will then be presented.

Published abstract


Slides from Presentation

Slides from presentation