Methodologies for Distributed Information Retrieval


Owen de Kretser
Department of Computer Science, The University of Melbourne, Parkville 3052, Australia.

Alistair Moffat
Department of Computer Science, The University of Melbourne, Parkville 3052, Australia.

Tim Shimmin
Department of Computer Science, RMIT, GPO Box 2476V, Melbourne 3001, Australia.

Justin Zobel
Department of Computer Science, RMIT, GPO Box 2476V, Melbourne 3001, Australia.


Status

Proc. 18th International Conference on Distributed Computing Systems, Amsterdam, May 1998, pages 66-73.

Abstract

Text collections have traditionally been located at a single site and managed as a monolithic whole. Content-based retrieval from such a collection is straightforward, as a central index can be used to direct the search, and, for ranked queries, collection-wide weights can be assigned to query terms. However, it is now common for a collection to be spread over several hosts and for these hosts to be geographically separated. In this paper we examine several alternative approaches to distributed text retrieval. We report on our experience with a full implementation of these methods, and give retrieval efficiency and retrieval effectiveness results for collections distributed over both a local area network and a wide area network. We conclude that distributed information retrieval systems can be fast and effective, making them a valuable tool for management of large amounts of textual data, but that they are not efficient.