Click-Based Evidence for Decaying Weight Distributions
in Search Effectiveness Metrics
Yuye Zhang
Department of Computer Science and Software Engineering,
The University of Melbourne,
Victoria 3010, Australia.
Laurence A. F. Park
Department of Computer Science and Software Engineering,
The University of Melbourne,
Victoria 3010, Australia.
Alistair Moffat
Department of Computer Science and Software Engineering,
The University of Melbourne,
Victoria 3010, Australia.
Status
Information Retrieval, 13(1):46-69, February 2010.
Online version published 30 June 2009.
Abstract
Search effectiveness metrics are used to evaluate the quality of the
answer lists returned by search services, usually based on a set of
relevance judgments.
One plausible way of calculating an effectiveness score for a system
run is to compute the inner-product of the run's relevance vector
and a "utility" vector, where the i th element in the utility vector
represents the relative
benefit obtained by the user of the system if they encounter a
relevant document at depth i in the ranking.
This paper uses such a framework to examine the user behavior
patterns -- and hence utility weightings -- that can be inferred from
a web query log.
We describe a process for extrapolating user observations from query
log clickthroughs, and employ this user model to measure the quality
of effectiveness weighting distributions.
Our results show that for measures with static distributions (that
is, utility weighting schemes for which the weight vector is
independent of the relevance vector), the geometric weighting model
employed in the rank-biased precision effectiveness metric offers the
closest fit to the user observation model.
In addition, using past TREC data as to indicate likelihood of
relevance, we also show that the distributions employed in the BPref
and MRR metrics are the best fit out of the measures for which static
distributions do not exist.