Adding Compression to a Full-Text Retrieval System
Justin Zobel
Department of Computer Science,
RMIT, GPO Box 2476V,
Melbourne 3001, Australia.
Alistair Moffat
Department of Computer Science,
The University of Melbourne,
Parkville 3052, Australia.
Status
Software---Practice and Experience, 25(8):891-903, August 1995.
Abstract
We describe the implementation of a data compression scheme as an
integral and transparent layer within a full-text retrieval system.
Using a semi-static word-based compression model, the space needed
to store the text is under 30\% of the original
requirement.
The model is used in conjunction with canonical Huffman coding, and
together these two paradigms provide fast decompression.
Experiments with 500~Mb of newspaper articles show that in
full-text retrieval environments compression not only saves space,
it can also yield faster query processing---a win-win situation.