Does Selective Search Benefit from WAND Optimization?

Yubin Kim
Language Technologies Institute, Carnegie Mellon University, Pittsburgh, PA 15213, USA

James P. Callan
Language Technologies Institute, Carnegie Mellon University, Pittsburgh, PA 15213, USA

Shane Culpepper
School of Computer Science and Information Technology, RMIT University, Victoria 3001, Australia.

Alistair Moffat
Department of Computing and Information Systems, The University of Melbourne, Victoria 3010, Australia.

Status

Proc. 38th European Conf. on Information Retreival, Padova, Italy, April 2016, pages 145-158.

Abstract

Selective search is a distributed retrieval technique that reduces the computational cost of large-scale information retrieval. By partitioning the collection into topical shards, and using a resource selection algorithm to identify a subset of shards to search, selective search allows retrieval effectiveness to be maintained while evaluating fewer postings, often resulting in 90+% reductions in querying cost. However, there has been only limited attention given to the interaction between dynamic pruning algorithms and topical index shards. We demonstrate that the WAND dynamic pruning algorithm is more effective on topical index shards than it is on randomly-organized index shards, and that the savings generated by selective search and WAND are additive. We also compare two methods for applying WAND to topical shards: searching each shard with a separate top-k heap and threshold; and sequentially passing a shared top-k heap and threshold from one shard to the next, in the order established by a resource selection mechanism. Separate top-k heaps provide low query latency, whereas a shared top-k heap provides higher throughput.

Full text

http://dx.doi.org/10.1007/978-3-319-30671-1_11 .