Adding Compression to a Full-Text Retrieval System


Justin Zobel
Department of Computer Science, RMIT, GPO Box 2476V, Melbourne 3001, Australia.

Alistair Moffat
Department of Computer Science, The University of Melbourne, Parkville 3052, Australia.


Status

Software---Practice and Experience, 25(8):891-903, August 1995.

Abstract

We describe the implementation of a data compression scheme as an integral and transparent layer within a full-text retrieval system. Using a semi-static word-based compression model, the space needed to store the text is under 30\% of the original requirement. The model is used in conjunction with canonical Huffman coding, and together these two paradigms provide fast decompression. Experiments with 500~Mb of newspaper articles show that in full-text retrieval environments compression not only saves space, it can also yield faster query processing---a win-win situation.