Load-Balancing in Distributed Selective Search
Yubin Kim
Language Technologies Institute,
Carnegie Mellon University,
Pittsburgh, PA 15213, USA
James P. Callan
Language Technologies Institute,
Carnegie Mellon University,
Pittsburgh, PA 15213, USA
Shane Culpepper
School of Computer Science and Information Technology,
RMIT University,
Victoria 3001, Australia.
Alistair Moffat
Department of Computing and Information Systems,
The University of Melbourne,
Victoria 3010, Australia.
Status
Proc. 39th Ann. Int. ACM SIGIR Conf. on
Research and Development in Information Retrieval,
Pisa, Italy, July 2015, pages 905-908.
Abstract
Simulation and analysis have shown that selective search can reduce
the cost of large-scale distributed information retrieval.
By partitioning the collection into small topical shards, and then
using a resource ranking algorithm to choose a subset of shards to
search for each query, fewer postings are evaluated.
Here we extend the study of selective search using a fine-grained
simulation investigating: selective search efficiency in a parallel
query processing environment; the difference in efficiency when
term-based and sample-based resource selection algorithms are used;
and the effect of two policies for assigning index shards to
machines.
Results obtained for two large datasets and four large query logs
confirm that selective search is significantly more efficient than
conventional distributed search.
In particular, we show that selective search is capable of both
higher throughput and lower latency in a parallel environment than is
exhaustive search.
Full text
http://doi.acm.org/10.1145/2911451.2914689
.