Block Merging for Off-Line Compression
Raymond Wan
Department of Computer Science and Software Engineering,
The University of Melbourne,
Victoria 3010, Australia.
Alistair Moffat
Department of Computer Science and Software Engineering,
The University of Melbourne,
Victoria 3010, Australia.
Status
Journal of the American Society for
Information Science and Technology,
58(1):3-14, January 2007.
Preliminary version presented at the
International Conference on Combinatorial Pattern Matching,
Fukuoka, Japan, July 2002.
Abstract
To bound memory consumption, most compression systems provide a
facility that controls the amount of data that may be processed at
once - usually as a block size, but sometimes as a direct megabyte
limit.
In this work we consider the Re-Pair mechanism of Larsson and Moffat
(2000), which processes large messages as disjoint blocks to limit
memory consumption.
We show that the blocks emitted by Re-Pair can be postprocessed to
yield further savings, and describe techniques that allow files of
500 MB or more to be compressed in a holistic manner using less than
that much main memory.
The block merging process we describe has the additional advantage of
allowing new text to be appended to the end of the compressed file.
Full text
http://dx.doi.org/10.1002/asi.20515.
Software
Software implementing Re-Pair, Des-Pair, and Re-Store is available
from
http://www.bic.kyoto-u.ac.jp/pathway/rwan/software/restore.html.