Exploring Evaluation Metrics: GMAP versus MAP
Sri Devi Ravana
Department of Computer Science and Software Engineering,
The University of Melbourne,
Victoria 3010, Australia.
Alistair Moffat
Department of Computer Science and Software Engineering,
The University of Melbourne,
Victoria 3010, Australia.
Status
Proc. 31st Annual International ACM SIGIR Conference on
Research and Development in Information Retrieval, Singapore,
July 2008, pages 687-688.
Poster presentation.
Abstract
In retrieval experiments, an effectiveness metrics is used to
generate a score for each system-topic pair being tested.
It is then usual to average the system-topic scores to obtain a
system score, which is used for the purpose of system comparison.
In this paper we explore the ramifications of using the geometric
mean (GMAP), rather than the arithmetic mean (MAP) when computing an
aggregate system score from a set of system-topic scores.
We find that GMAP does indeed handle variability in topic difficulty
more consistently than does the usual MAP aggregation method.
Full text
http://doi.acm.org/10.1145/1390334.1390452