Load Balancing for Term-Distributed Parallel RetrievaL
Alistair Moffat
Department of Computer Science and Software Engineering,
The University of Melbourne,
Victoria 3010, Australia.
William Webber
Department of Computer Science and Software Engineering,
The University of Melbourne,
Victoria 3010, Australia.
Justin Zobel
School of Computer Science and Information Technology,
RMIT University,
Victoria 3001, Australia.
Status
Proc. 29th Annual International ACM SIGIR Conference on
Research and Development in Information Retrieval, Seattle,
August 2006, pages 348-355.
Abstract
Large-scale web and text retrieval systems deal with
amounts of data that greatly exceed the capacity of any
single machine. To handle the necessary data volumes
and query throughput rates, parallel systems are used,
in which the document and index data are split across
tightly-clustered distributed computing systems. The index
data can be distributed either by document or by term.
In this paper we examine methods for load balancing in
term-distributed parallel architectures, and propose a
suite of techniques for reducing net querying costs.
In combination, the techniques we describe allow a
30% improvement in query throughput when tested on an
eight-node parallel computer system.
Full text
http://doi.acm.org/10.1145/1148170.1148232.