Compressed Inverted Files with Reduced Decoding Overheads


Vo Ngoc Anh
Department of Computer Science, The University of Melbourne, Parkville 3052, Australia.

Alistair Moffat
Department of Computer Science, The University of Melbourne, Parkville 3052, Australia.


Status

Proc. 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Melbourne, Australia, August 1998, pages 290-297.

Abstract

Compressed inverted files are the most compact way of indexing large text databases, typically occupying around 10% of the space of the collection they index. The drawback of compression is the need to decompress the index lists during query processing. Here we describe an improved implementation of compressed inverted lists that eliminates almost all redundant decoding and allows extremely fast processing of conjunctive Boolean queries and ranked queries. We also describe a pruning method to reduce the number of candidate documents considered during the evaluation of ranked queries. Experimental results with a database of 510~Mb show that the new mechanism can reduce the CPU and elapsed time for Boolean queries of 4--10 terms to one tenth and one fifth respectively of the standard technique. For ranked queries, the new mechanism reduces both CPU and elapsed time to one third and memory usage to less than one tenth of the standard algorithm, with no degradation in retrieval effectiveness.