Practical Length-Limited Coding for Large Alphabets
Andrew Turpin
Department of Computer Science,
The University of Melbourne,
Parkville 3052, Australia.
Alistair Moffat
Department of Computer Science,
The University of Melbourne,
Parkville 3052, Australia.
Status
The Computer Journal, 38(5):339-347, 1995.
Abstract
The use of minimum-cost coding for economical representation of a stream
of symbols drawn from a defined source alphabet is widely known.
However, for large-scale compression minimum-cost coding has the drawback
that codewords generated may be longer than a machine word, limiting
the usefulness of both software and hardware implementations on
word-based architectures.
The solution is to generate length-limited codes, and accept
the consequent loss of compression effectiveness in order to preserve
the simplicity and speed of the encoding and decoding software.
Here we re-examine the package-merge algorithm for generating
minimum-cost length-limited prefix-free codes, and show that with a considered
reorganisation of the key steps it is possible for it to run quickly
in significantly less memory than was required by previous
implementations, while retaining asymptotic efficiency.
As evidence of the practical usefulness of the improved method we
describe experiments on an alphabet of over one million symbols, for
which length-limited codes can be constructed in 11 megabytes
of memory and about 20 seconds of CPU time.