Hybrid Bitvector Index Compression
Alistair Moffat
Department of Computer Science and Software Engineering,
The University of Melbourne,
Victoria 3010, Australia.
J. Shane Culpepper
NICTA Victoria Laboratory,
Department of Computer Science and Software Engineering,
The University of Melbourne,
Victoria 3010, Australia.
Status
Proc. 12th Australasian Document Computing Symposium,
Melbourne, Australia, December 2007, pages 25-31.
Abstract
Bitvector index representations provide fast resolution of
conjunctive Boolean queries, but require a great deal of storage
space.
On the other hand, compressed index representations are
space-efficient, but query evaluation tends to be slower than
bitvector evaluation, because of the need for sequential or pseudo-random
access into
the compressed index lists.
Here we investigate a simple hybrid mechanism that stores only a
small fraction of the inverted lists as bitvectors and has no or
negligible effect on compressed index size compared to the use of
byte codes, but improves query processing throughput compared to both
byte coded representations and entirely-bitvector arrangements.
Note that Figure 2 and its caption were incorrect in the CD
conference proceedings, and have been altered.