UQV: A Test Collection with Query Variability
Department of Computing and Information Systems,
The University of Melbourne,
Victoria 3010, Australia.
School of Computer Science and Information Technology,
Victoria 3001, Australia.
Proc. 39th Ann. Int. ACM SIGIR Conf. on
Research and Development in Information Retrieval,
Pisa, Italy, July 2016, pages 725-738.
We describe a test collection (UQV100) that is designed to
incorporate variability from users.
One hundred topics (or specific sub-topics) from the 2013 and 2014
TREC Web Tracks were re-purposed via information-need statements
Crowd workers were then asked to read the backstories, and provide
the queries they would use, plus corresponding effort estimates for
the number of useful documents needed to satisfy the information
A total of 10,835 queries were collected from 263 workers.
After normalization and spell-correction, 5,764 unique variations
remained; these were then used to construct a document pool via
Indri-BM25 over the ClueWeb12 corpus.
Relevance judgments were made via qualified crowd workers relative to
the backstories using a relevance scale similar to the original TREC
judging approach, first to a pool depth of ten, and then second, of a
further set of targeted documents.
The backstories, query variations, spell-corrected queries, effort
estimates, run outputs, relevance judgments are made available
collectively as the UQV100 test collection, plus the judging
guidelines and the gold hits we used for crowd-worker qualification
and anti-spam detection.
We believe this test collection will unlock new opportunities for
novel investigations and analysis, including for problems such as
task-intent retrieval performance and consistency (independent of
query variation), query clustering, query difficulty prediction, and
relevance feedback, among others.