Binary Codes for Non-Uniform Sources
Alistair Moffat
Department of Computer Science and Software Engineering,
The University of Melbourne,
Victoria 3010, Australia.
Vo Ngoc Anh
Department of Computer Science and Software Engineering,
The University of Melbourne,
Victoria 3010, Australia.
Status
Proc. IEEE Data Compression Conference, Snowbird, Utah,
March 2005, pages 133-142.
Abstract
In many applications of compression, decoding speed is at least as
important as compression effectiveness.
For example, the large inverted indexes associated with text
retrieval mechanisms are best stored compressed, but a working system
must also process queries at high speed.
Here we present two coding methods that make use of fixed binary
representations.
They have all of the consequent benefits in terms of decoding
performance, but are also sensitive to localized variations in the
source data, and in practice give excellent compression.
The methods are validated by applying them to a 18 GB document
collection.
Software
An implementation of the RBUC-B integer coding mechanism described in
this paper is available for download from
http://people.eng.unimelb.edu.au/ammoffat/rbuc/.