Click-Based Evidence for Decaying Weight Distributions in Search Effectiveness Metrics

Yuye Zhang
Department of Computer Science and Software Engineering, The University of Melbourne, Victoria 3010, Australia.

Laurence A. F. Park
Department of Computer Science and Software Engineering, The University of Melbourne, Victoria 3010, Australia.

Alistair Moffat
Department of Computer Science and Software Engineering, The University of Melbourne, Victoria 3010, Australia.

Status

Information Retrieval, 13(1):46-69, February 2010. Online version published 30 June 2009.

Abstract

Search effectiveness metrics are used to evaluate the quality of the answer lists returned by search services, usually based on a set of relevance judgments. One plausible way of calculating an effectiveness score for a system run is to compute the inner-product of the run's relevance vector and a "utility" vector, where the i th element in the utility vector represents the relative benefit obtained by the user of the system if they encounter a relevant document at depth i in the ranking. This paper uses such a framework to examine the user behavior patterns -- and hence utility weightings -- that can be inferred from a web query log. We describe a process for extrapolating user observations from query log clickthroughs, and employ this user model to measure the quality of effectiveness weighting distributions. Our results show that for measures with static distributions (that is, utility weighting schemes for which the weight vector is independent of the relevance vector), the geometric weighting model employed in the rank-biased precision effectiveness metric offers the closest fit to the user observation model. In addition, using past TREC data as to indicate likelihood of relevance, we also show that the distributions employed in the BPref and MRR metrics are the best fit out of the measures for which static distributions do not exist.