Practical Length-Limited Coding for Large Alphabets

Andrew Turpin
Department of Computer Science, The University of Melbourne, Parkville 3052, Australia.

Alistair Moffat
Department of Computer Science, The University of Melbourne, Parkville 3052, Australia.

Status

The Computer Journal, 38(5):339-347, 1995.

Abstract

The use of minimum-cost coding for economical representation of a stream of symbols drawn from a defined source alphabet is widely known. However, for large-scale compression minimum-cost coding has the drawback that codewords generated may be longer than a machine word, limiting the usefulness of both software and hardware implementations on word-based architectures. The solution is to generate length-limited codes, and accept the consequent loss of compression effectiveness in order to preserve the simplicity and speed of the encoding and decoding software. Here we re-examine the package-merge algorithm for generating minimum-cost length-limited prefix-free codes, and show that with a considered reorganisation of the key steps it is possible for it to run quickly in significantly less memory than was required by previous implementations, while retaining asymptotic efficiency. As evidence of the practical usefulness of the improved method we describe experiments on an alphabet of over one million symbols, for which length-limited codes can be constructed in 11 megabytes of memory and about 20 seconds of CPU time.