Text Compression for Dynamic Document Databases
Alistair Moffat
Department of Computer Science,
The University of Melbourne,
Parkville 3052, Australia.
Justin Zobel
Department of Computer Science,
RMIT, GPO Box 2476V,
Melbourne 3001, Australia.
Neil Sharman
Department of Computer Science,
The University of Melbourne,
Parkville 3052, Australia.
Status
IEEE Trans. Knowledge and Data Engineering,
9(2):302-313, March/April 1997.
Abstract
For compression of text databases semi-static word-based methods
provide good performance in terms of both speed and disk space, but two
problems arise.
First, the memory requirements for the compression model during decoding
can be unacceptably high.
Second, the need to handle document insertions means that the
collection must be periodically recompressed if compression
efficiency is to be maintained on dynamic collections.
Here we show that with careful management the impact
of both of these drawbacks
can be kept small.
Experiments with a word-based model and over 500~Mb of text show that
excellent
compression rates can be retained even in the presence of severe memory
limitations on the decoder, and after significant expansion
in the amount of stored text.