Offline Dictionary-Based Compression


N. Jesper Larsson
Department of Computer Science, Lund University, Sweden.

Alistair Moffat
Department of Computer Science and Software Engineering, The University of Melbourne, Parkville 3052, Australia.


Status

Proc. IEEE, 88(11):1722-1732, November 2000.

Abstract

Dictionary-based modelling is a mechanism used in many practical compression schemes. In most implementations of dictionary-based compression the encoder operates online, incrementally inferring its dictionary of available phrases from previous parts of the message. An alternative approach is to use the full message to infer a complete dictionary in advance, and include an explicit representation of the dictionary as part of the compressed message. In this investigation we develop a compression scheme which is a combination of a simple but powerful phrase derivation method and a compact dictionary encoding. The scheme is highly efficient, particularly in decompression, and has characteristics that make it a favourable choice when compressed data is to be searched directly. We describe data structures and algorithms that allow our mechanism to operate in linear time and space.

Software

Software implementing Re-Pair, Des-Pair, and Re-Store is available from http://www.bic.kyoto-u.ac.jp/pathway/rwan/software/restore.html.