Methodologies for Distributed Information Retrieval
Owen de Kretser
Department of Computer Science,
The University of Melbourne,
Parkville 3052, Australia.
Alistair Moffat
Department of Computer Science,
The University of Melbourne,
Parkville 3052, Australia.
Tim Shimmin
Department of Computer Science,
RMIT, GPO Box 2476V,
Melbourne 3001, Australia.
Justin Zobel
Department of Computer Science,
RMIT, GPO Box 2476V,
Melbourne 3001, Australia.
Status
Proc. 18th International Conference on Distributed Computing Systems,
Amsterdam, May 1998, pages 66-73.
Abstract
Text collections have traditionally been located at a single site and
managed as a monolithic whole.
Content-based retrieval from such a collection is straightforward, as
a central index can be used to direct the search, and, for ranked
queries, collection-wide weights can be assigned to query terms.
However, it is now common for a collection to be spread over several
hosts and for these hosts to be geographically separated.
In this paper we examine several alternative approaches to
distributed text retrieval.
We report on our experience with a full implementation of these
methods, and give retrieval efficiency and retrieval effectiveness
results for collections distributed over both a local area network
and a wide area network.
We conclude that distributed information retrieval systems can be
fast and effective, making them a valuable tool for management of
large amounts of textual data, but that they are not efficient.