What Would We Like IR Metrics to Measure?
Alistair Moffat
Department of Computing and Information Systems,
The University of Melbourne,
Victoria 3010, Australia.
Status
Invited presentation, Proc. 12th NTCIR Conference on Evaluation of Information Access Technologies,
Tokyo,
June 2016.
Abstract
The field of Information Retrieval has a long-standing tradition of
rigorous evaluation, and an expectation that proposals for new
mechanisms and techniques will either be evaluated in batch-mode
experiments against realistic test collections, with results reported
derived from standard tools; or will be evaluated through the use of
user studies.
This emphasis on evidence, and the desire for verification of
proposals, has meant that IR effectiveness measurement is an
important area studied in its own right.
The result has been the development of a complex suite of relevance
metrics, each of them with seemingly different behavior.
Well-known examples include Precision, Recall, Average Precision,
Normalized Discounted Cumulative Gain, BPref, the Q-Measure,
Rank-Biased Precision (RBP), and so on.
In this presentation the underlying question of what it is that a
metric should measure is returned to, with a set of desiderata for
usefulness used as a starting point for examining the existing
palette of metrics.
Recent work that has described a goal-sensitive adaptive metric
called INST will then be presented.
Published abstract
http://research.nii.ac.jp/ntcir/workshop/OnlineProceedings12/pdf/ntcir/01-NTCIR12-Keynote1-MoffatA.pdf
Slides from Presentation
Slides from presentation