Compact Set Representation for Information Retrieval
J. Shane Culpepper
NICTA Victoria Laboratory,
Department of Computer Science and Software Engineering,
The University of Melbourne,
Victoria 3010, Australia.
Alistair Moffat
Department of Computer Science and Software Engineering,
The University of Melbourne,
Victoria 3010, Australia.
Status
Proc. 14th Int. Symp. String Processing and Information Retrieval,
Santiago, Chile, October 2007, pages 137-148.
LNCS volume 4726.
Abstract
Conjunctive Boolean queries are a fundamental operation in web search
engines.
These queries can be reduced to the problem of intersecting ordered
sets of integers, where each set represents the documents containing
one of the query terms.
But there is tension between the desire to store the lists
effectively, in a compressed form, and the desire to carry out
intersection operations efficiently, using non-sequential processing
modes.
In this paper we evaluate intersection algorithms on compressed sets,
comparing them to the best non-sequential array-based intersection
algorithms.
By adding a simple, low-cost, auxiliary index, we show that compressed
storage need not hinder efficient and high-speed intersection
operations.
Full text
http://dx.doi.org/10.1007/10.1007/978-3-540-75530-2_13