Hybrid Bitvector Index Compression

Alistair Moffat
Department of Computer Science and Software Engineering, The University of Melbourne, Victoria 3010, Australia.

J. Shane Culpepper
NICTA Victoria Laboratory, Department of Computer Science and Software Engineering, The University of Melbourne, Victoria 3010, Australia.

Status

Proc. 12th Australasian Document Computing Symposium, Melbourne, Australia, December 2007, pages 25-31.

Abstract

Bitvector index representations provide fast resolution of conjunctive Boolean queries, but require a great deal of storage space. On the other hand, compressed index representations are space-efficient, but query evaluation tends to be slower than bitvector evaluation, because of the need for sequential or pseudo-random access into the compressed index lists. Here we investigate a simple hybrid mechanism that stores only a small fraction of the inverted lists as bitvectors and has no or negligible effect on compressed index size compared to the use of byte codes, but improves query processing throughput compared to both byte coded representations and entirely-bitvector arrangements.

Full text

Note that Figure 2 and its caption were incorrect in the CD conference proceedings, and have been altered.