What Would We Like IR Metrics to Measure?
Department of Computing and Information Systems,
The University of Melbourne,
Victoria 3010, Australia.
Invited presentation, Proc. 12th NTCIR Conference on Evaluation of Information Access Technologies,
The field of Information Retrieval has a long-standing tradition of
rigorous evaluation, and an expectation that proposals for new
mechanisms and techniques will either be evaluated in batch-mode
experiments against realistic test collections, with results reported
derived from standard tools; or will be evaluated through the use of
This emphasis on evidence, and the desire for verification of
proposals, has meant that IR effectiveness measurement is an
important area studied in its own right.
The result has been the development of a complex suite of relevance
metrics, each of them with seemingly different behavior.
Well-known examples include Precision, Recall, Average Precision,
Normalized Discounted Cumulative Gain, BPref, the Q-Measure,
Rank-Biased Precision (RBP), and so on.
In this presentation the underlying question of what it is that a
metric should measure is returned to, with a set of desiderata for
usefulness used as a starting point for examining the existing
palette of metrics.
Recent work that has described a goal-sensitive adaptive metric
called INST will then be presented.
Slides from Presentation
Slides from presentation