SIGIR'98 papers: Efficient Construction of Large Test Collections
Efficient Construction of Large Test Collections
Gordon V. Cormack
Department of Computer Science,
University of Waterloo,
Waterloo, Ontario N2L 3G1,
Canada.
Christopher R. Palmer
Department of Computer Science,
University of Waterloo,
Waterloo, Ontario N2L 3G1,
Canada.
Charles L. A. Clarke
Department of Electrical and Computer Engineering,
10 King's College Road,
University of Toronto,
Toronto, Ontario M5S 3G4,
Canada
Abstract
Test collections with a million or more documents are needed for the
evaluation of modern information retrieval systems.
Yet their construction requires a great deal of effort.
Judgements must be rendered as to whether or not documents are relevant to
each of a set of queries.
Exhaustive judging, in which every document is examined and a judgement
rendered, is infeasible for collections of this size.
Current practice is represented by the "pooling method", as used in the TREC
conference series, in which only the first k documents from each of a
number of sources are judged.
We propose two methods, Interactive Searching and Judging and
Move-to-Front Pooling, that yield effective test collections while
requiring many fewer judgements.
Interactive Searching and Judging selects documents to be judged using
an interactive search system, and may be used by a small research team
to develop an effective test collection using minimal resources.
Move-to-Front Pooling directly improves on the standard pooling method by
using a variable number of documents from each source depending on its
retrieval performance.
Move-to-Front Pooling would be an appropriate replacement for the standard
pooling method in future collection development efforts involving many
independent groups.
SIGIR'98
24-28 August 1998
Melbourne, Australia.
sigir98@cs.mu.oz.au.