On the Cost of Phrase-Based Ranking


Matthias Petri
Department of Computing and Information Systems, The University of Melbourne, Victoria 3010, Australia.

Alistair Moffat
Department of Computing and Information Systems, The University of Melbourne, Victoria 3010, Australia.


Status

Proc. 38th Ann. Int. ACM SIGIR Conf. on Research and Development in Information Retrieval, Santiago, August 2015, pages 931-934.

Abstract

Effective postings list compression techniques, and the efficiency of postings list processing schemes such as WAND, have significantly improved the practical performance of ranked document retrieval using inverted indexes. Recently, suffix array-based index structures have been proposed as a complementary tool, to support phrase searching. The relative merits of these alternative approaches to ranked querying using phrase components are, however, unclear. Here we provide: (1) an overview of existing phrase indexing techniques; (2) a description of how to incorporate recent advances in list compression and processing; and (3) an empirical evaluation of state-of-the-art suffix-array and inverted file-based phrase retrieval indexes using a standard IR test collection.

Full text

http://dx.doi.org/10.1145/2766462.2767769 .

Software

https://github.com/mpetri/pos-cmp.