SIGIR'98 papers:
Evaluating Database Selection Techniques: A Testbed and Experiments
Evaluating Database Selection Techniques: A Testbed and Experiments
James C. French
Department of Computer Science,
University of Virginia,
Charlottesville, VA.
Allison L. Powell
Department of Computer Science,
University of Virginia,
Charlottesville, VA.
Charles L. Viles
School of Information and Library Science,
University of North Carolina,
Chapel Hill, NC.
Travis Emmitt
Department of Computer Science,
University of Virginia,
Charlottesville, VA.
Kevin J. Prey
Department of Computer Science,
University of Virginia,
Charlottesville, VA.
Abstract
We describe a testbed for database selection techniques and an
experiment conducted using this testbed. The testbed
is a decomposition of the TREC/TIPSTER data that allows analysis of the
data along multiple dimensions, including collection-based and
temporal-based analysis. We characterize the subcollections in this
testbed in terms of number of documents, queries against which
the documents have been evaluated for relevance, and distribution of
relevant documents. We then present initial results from a study
conducted using this testbed that examines the effectiveness of
the gGlOSS approach to database selection. The databases from our
testbed were ranked using the gGlOSS techniques and
compared to the gGlOSS Ideal(l) baseline and a baseline derived
from TREC relevance judgements. We have examined the degree to which
several gGlOSS estimate functions approximate these baselines.
Our initial results suggest that the gGlOSS estimators are excellent
predictors of the Ideal(l) ranks but that the Ideal(l) ranks do
not estimate relevance-based ranks well.
SIGIR'98
24-28 August 1998
Melbourne, Australia.
sigir98@cs.mu.oz.au.