SIGIR'98 papers: Efficient Construction of Large Test Collections

Efficient Construction of Large Test Collections


Gordon V. Cormack
Department of Computer Science, University of Waterloo, Waterloo, Ontario N2L 3G1, Canada.

Christopher R. Palmer
Department of Computer Science, University of Waterloo, Waterloo, Ontario N2L 3G1, Canada.

Charles L. A. Clarke
Department of Electrical and Computer Engineering, 10 King's College Road, University of Toronto, Toronto, Ontario M5S 3G4, Canada


Abstract

Test collections with a million or more documents are needed for the evaluation of modern information retrieval systems. Yet their construction requires a great deal of effort. Judgements must be rendered as to whether or not documents are relevant to each of a set of queries. Exhaustive judging, in which every document is examined and a judgement rendered, is infeasible for collections of this size. Current practice is represented by the "pooling method", as used in the TREC conference series, in which only the first k documents from each of a number of sources are judged. We propose two methods, Interactive Searching and Judging and Move-to-Front Pooling, that yield effective test collections while requiring many fewer judgements. Interactive Searching and Judging selects documents to be judged using an interactive search system, and may be used by a small research team to develop an effective test collection using minimal resources. Move-to-Front Pooling directly improves on the standard pooling method by using a variable number of documents from each source depending on its retrieval performance. Move-to-Front Pooling would be an appropriate replacement for the standard pooling method in future collection development efforts involving many independent groups.


SIGIR'98
24-28 August 1998
Melbourne, Australia.
sigir98@cs.mu.oz.au.