Effective Retrieval with Distributed Collections
Jinxi Xu
Department of Computer Science, University of Massachusetts, Amherst, MA
01003, USA.
Jamie Callan
Department of Computer Science, University of Massachusetts, Amherst, MA
01003, USA.
Abstract
This paper evaluates the retrieval effectiveness of distributed information
retrieval systems in realistic environments. We find that when a large number of
collections are available, the retrieval
effectiveness is significantly worse than that of centralized systems,
mainly because typical queries are not adequate for the purpose of
choosing the right collections. We propose two techniques to address the problem.
One is to use phrase information in the collection selection index and the other
is query expansion. Both techniques enhance the discriminatory
power of typical queries for choosing the right collections and hence
significantly improve retrieval results. Query expansion, in particular,
brings the effectiveness of searching a large set of distributed collections
close to that of searching a centralized collection.
SIGIR'98
24-28 August 1998
Melbourne, Australia.
sigir98@cs.mu.oz.au.