Inverted Index Compression using Word-Aligned Binary Codes
Vo Ngoc Anh
Department of Computer Science and Software Engineering,
The University of Melbourne,
Victoria 3010, Australia.
Alistair Moffat
Department of Computer Science and Software Engineering,
The University of Melbourne,
Victoria 3010, Australia.
Status
Information Retrieval, 8(1):151-166, January 2005.
Abstract
We examine index representation techniques for document-based
inverted files, and present a new mechanism for compressing them
using word-aligned binary codes.
The new mechanism allows extremely fast decoding of inverted lists
during query processing, while providing compression rates better
than other high-throughput representations.
Results are given for several large text collections in support of
these claims, both for compression effectiveness and query
efficiency.
Full text
http://dx.doi.org/10.1023/B:INRT.0000048490.99518.5c
.
Software
An implementation of the Carryover-12 integer coding mechanism
described in this paper is available for download from
http://people.eng.unimelb.edu.au/ammoffat/carry/.