SIGIR'98 papers: How Reliable are the Results of Large-Scale Information Retrieval Experiments?
How Reliable are the Results of Large-Scale Information Retrieval Experiments?
Justin Zobel
Department of Computer Science,
RMIT, GPO Box 2476V, Melbourne 3001, Australia.
Abstract
Two stages in measurement of techniques for information retrieval are
gathering of documents for relevance assessment and use of the
assessments to numerically evaluate effectiveness. We consider both of
these stages in the context of the TREC experiments, to determine
whether they lead to measurements that are trustworthy and fair. Our
detailed empirical investigation of the TREC results shows that the
measured relative performance of systems appears to be reliable, but
that recall is overestimated: it is likely that many relevant documents
have not been found. We propose a new pooling strategy that can
significantly increase the number of relevant documents found for given
effort, without compromising fairness.
SIGIR'98
24-28 August 1998
Melbourne, Australia.
sigir98@cs.mu.oz.au.