Pooled Evaluation Over Query Variations: Users are as Diverse as Systems

Alistair Moffat
Department of Computing and Information Systems, The University of Melbourne, Victoria 3010, Australia.

Falk Scholer
School of Computer Science and Information Technology, RMIT University, Victoria 3001, Australia.

Paul Thomas
CSIRO and Australian National University, Canberra Australia

Peter Bailey
Microsoft, Australia


Proc. 24th ACM CIKM Int. Conf. on Information and Knowledge Management, Melbourne, October 2015, pages 1759-1762.


Evaluation of information retrieval systems with test collections makes use of a suite of fixed resources: a document corpus; a set of topics; and associated judgments of the relevance of each document to each topic. With large modern collections, exhaustive judging is not feasible. Therefore an approach called pooling is typically used where, for example, the documents to be judged can be determined by taking the union of all documents returned in the top positions of the answer lists returned by a range of systems.

Conventionally, pooling uses system variations to provide diverse documents to be judged for a topic; different user queries are not considered. We explore the ramifications of user query variability on pooling, and demonstrate that conventional test collections do not cover this source of variation. The effect of user query variation on the size of the judging pool is just as strong as the effect of retrieval system variation. We conclude that user query variation should be incorporated early in test collection construction, and cannot be considered effectively post hoc.

Full text


Data Resource

http://dx.doi.org/10.4225/08/55D0B6A098248 .