SIGIR'98 papers: Evaluating Database Selection Techniques: A Testbed and Experiments

Evaluating Database Selection Techniques: A Testbed and Experiments


James C. French
Department of Computer Science, University of Virginia, Charlottesville, VA.

Allison L. Powell
Department of Computer Science, University of Virginia, Charlottesville, VA.

Charles L. Viles
School of Information and Library Science, University of North Carolina, Chapel Hill, NC.

Travis Emmitt
Department of Computer Science, University of Virginia, Charlottesville, VA.

Kevin J. Prey
Department of Computer Science, University of Virginia, Charlottesville, VA.


Abstract

We describe a testbed for database selection techniques and an experiment conducted using this testbed. The testbed is a decomposition of the TREC/TIPSTER data that allows analysis of the data along multiple dimensions, including collection-based and temporal-based analysis. We characterize the subcollections in this testbed in terms of number of documents, queries against which the documents have been evaluated for relevance, and distribution of relevant documents. We then present initial results from a study conducted using this testbed that examines the effectiveness of the gGlOSS approach to database selection. The databases from our testbed were ranked using the gGlOSS techniques and compared to the gGlOSS Ideal(l) baseline and a baseline derived from TREC relevance judgements. We have examined the degree to which several gGlOSS estimate functions approximate these baselines. Our initial results suggest that the gGlOSS estimators are excellent predictors of the Ideal(l) ranks but that the Ideal(l) ranks do not estimate relevance-based ranks well.


SIGIR'98
24-28 August 1998
Melbourne, Australia.
sigir98@cs.mu.oz.au.