Adding Compression to a Full-Text Retrieval System

Justin Zobel
Department of Computer Science, RMIT, GPO Box 2476V, Melbourne 3001, Australia.

Alistair Moffat
Department of Computer Science, The University of Melbourne, Parkville 3052, Australia.


Software---Practice and Experience, 25(8):891-903, August 1995.


We describe the implementation of a data compression scheme as an integral and transparent layer within a full-text retrieval system. Using a semi-static word-based compression model, the space needed to store the text is under 30\% of the original requirement. The model is used in conjunction with canonical Huffman coding, and together these two paradigms provide fast decompression. Experiments with 500~Mb of newspaper articles show that in full-text retrieval environments compression not only saves space, it can also yield faster query processing---a win-win situation.