Efficient Distributed Selective Search
Yubin Kim
Language Technologies Institute,
Carnegie Mellon University,
Pittsburgh, PA 15213, USA
James P. Callan
Language Technologies Institute,
Carnegie Mellon University,
Pittsburgh, PA 15213, USA
Shane Culpepper
School of Computer Science and Information Technology,
RMIT University,
Victoria 3001, Australia.
Alistair Moffat
Department of Computing and Information Systems,
The University of Melbourne,
Victoria 3010, Australia.
Status
Information Retrieval Journal,
20(3):221-252, 2017.
Parts of this paper appeared in preliminary form in the
Proceedings of 2016 ACM SIGIR International Conference on Research and
Development in Information Retrieval.
Abstract
Simulation and analysis have shown that selective search can reduce
the cost of large-scale distributed information retrieval.
By partitioning the collection into small topical shards, and then
using a resource ranking algorithm to choose a subset of shards to
search for each query, fewer postings are evaluated.
In this paper we extend the study of selective search into new areas
using a fine-grained simulation, examining the difference in
efficiency when term-based and sample-based resource selection
algorithms are used; measuring the effect of two policies for
assigning index shards to machines; and exploring the benefits of
index-spreading and mirroring as the number of deployed machines is
varied.
Results obtained for two large datasets and four large query logs
confirm that selective search is significantly more efficient than
conventional distributed search architectures and can handle higher
query rates.
Furthermore, we demonstrate that selective search can be tuned to
avoid bottlenecks, and thus maximize usage of the underlying computer
hardware.
Full text
http://dx.doi.org/10.1007/s10791-016-9290-6
or via
author link.