Inverted Index Compression using Word-Aligned Binary Codes


Vo Ngoc Anh
Department of Computer Science and Software Engineering, The University of Melbourne, Victoria 3010, Australia.

Alistair Moffat
Department of Computer Science and Software Engineering, The University of Melbourne, Victoria 3010, Australia.


Status

Information Retrieval, 8(1):151-166, January 2005.

Abstract

We examine index representation techniques for document-based inverted files, and present a new mechanism for compressing them using word-aligned binary codes. The new mechanism allows extremely fast decoding of inverted lists during query processing, while providing compression rates better than other high-throughput representations. Results are given for several large text collections in support of these claims, both for compression effectiveness and query efficiency.

Full text

http://dx.doi.org/10.1023/B:INRT.0000048490.99518.5c .

Software

An implementation of the Carryover-12 integer coding mechanism described in this paper is available for download from http://people.eng.unimelb.edu.au/ammoffat/carry/.