Compact Set Representation for Information Retrieval


J. Shane Culpepper
NICTA Victoria Laboratory, Department of Computer Science and Software Engineering, The University of Melbourne, Victoria 3010, Australia.

Alistair Moffat
Department of Computer Science and Software Engineering, The University of Melbourne, Victoria 3010, Australia.


Status

Proc. 14th Int. Symp. String Processing and Information Retrieval, Santiago, Chile, October 2007, pages 137-148. LNCS volume 4726.

Abstract

Conjunctive Boolean queries are a fundamental operation in web search engines. These queries can be reduced to the problem of intersecting ordered sets of integers, where each set represents the documents containing one of the query terms. But there is tension between the desire to store the lists effectively, in a compressed form, and the desire to carry out intersection operations efficiently, using non-sequential processing modes. In this paper we evaluate intersection algorithms on compressed sets, comparing them to the best non-sequential array-based intersection algorithms. By adding a simple, low-cost, auxiliary index, we show that compressed storage need not hinder efficient and high-speed intersection operations.

Full text

http://dx.doi.org/10.1007/10.1007/978-3-540-75530-2_13