Binary Interpolative Coding for Effective Index Compression
Alistair Moffat
Department of Computer Science and Software Engineering,
The University of Melbourne,
Parkville 3052, Australia.
Lang Stuiver
Department of Computer Science and Software Engineering,
The University of Melbourne,
Parkville 3052, Australia.
Status
Information Retrieval, 3(1):25-47, July 2000.
Abstract
Information retrieval systems contain large volumes of text, and
currently have typical sizes into the gigabyte range.
Inverted indexes are one important method for providing search
facilities into these collections, but unless compressed
require a great deal of space.
In this paper we introduce a new method for compressing inverted
indexes that yields excellent compression, fast decoding, and
exploits clustering -- the tendency for words to appear relatively
frequently in some parts of the collection and infrequently in
others.
We also describe two other quite separate applications for the same
compression method: representing the MTF list positions generated by
the Burrows-Wheeler
Block Sorting transformation; and transmitting the codebook for
semi-static block-based minimum-redundancy coding.