Users Versus Models: What Observation Tells Us About Effectiveness Metrics

Alistair Moffat
Department of Computing and Information Systems, The University of Melbourne, Victoria 3010, Australia.

Paul Thomas
CSIRO and Australian National University, Canberra, Australia

Falk Scholer
School of Computer Science and Information Technology, RMIT University, Victoria 3001, Australia.


Proc. 22nd ACM CIKM Conf. on Information and Knowledge Management, San Francisco, October 2013, pages 659-668.


Retrieval system effectiveness can be measured in two quite different ways: by monitoring the behavior of users and gathering data about the ease and accuracy with which they accomplish certain specified information-seeking tasks; or by using numeric effectiveness metrics to score system runs in reference to a set of relevance judgments. In the second approach, the effectiveness metric is chosen in the belief that user task performance, if it were to be measured by the first approach, should be linked to the score provided by the metric.

This work explores that link, by analyzing the assumptions and implications of a number of effectiveness metrics, and exploring how these relate to observable user behaviors. Data recorded as part of a user study included user self-assessment of search task difficulty; gaze position; and click activity. Our results show that user behavior is influenced by a blend of many factors, including the extent to which relevant documents are encountered, the stage of the search process, and task difficulty. These insights can be used to guide development of batch effectiveness metrics.

Published paper


There is a mistake in Equation 6 on page 3, describing C_{AP}; the subexpression (r_i/i) should actually be (r_j/j), in all three places where that subexpression occurs. (Thanks to Ziying Alicia Yang for spotting this.)