Users Versus Models:
What Observation Tells Us About Effectiveness Metrics
Alistair Moffat
Department of Computing and Information Systems,
The University of Melbourne,
Victoria 3010, Australia.
Paul Thomas
CSIRO and Australian National University,
Canberra, Australia
Falk Scholer
School of Computer Science and Information Technology,
RMIT University,
Victoria 3001, Australia.
Status
Proc. 22nd ACM CIKM Conf. on Information and Knowledge
Management,
San Francisco, October 2013, pages 659-668.
Abstract
Retrieval system effectiveness can be measured in two quite different
ways: by monitoring the behavior of users and gathering data about
the ease and accuracy with which they accomplish certain specified
information-seeking tasks; or by using numeric effectiveness metrics
to score system runs in reference to a set of relevance judgments.
In the second approach, the effectiveness metric is chosen in the
belief that user task performance, if it were to be measured by the
first approach, should be linked to the score provided by the metric.
This work explores that link, by analyzing the assumptions and
implications of a number of effectiveness metrics, and exploring how
these relate to observable user behaviors.
Data recorded as part of a user study included user self-assessment
of search task difficulty; gaze position; and click activity.
Our results show that user behavior is influenced by a blend of many
factors, including the extent to which relevant documents are
encountered, the stage of the search process, and task difficulty.
These insights can be used to guide development of batch
effectiveness metrics.
Published paper
http://dx.doi.org/10.1145/2505515.2507665
Errata
There is a mistake in Equation 6 on page 3, describing C_{AP}; the
subexpression (r_i/i) should actually be (r_j/j), in all three places
where that subexpression occurs.
(Thanks to Ziying Alicia Yang for spotting this.)