Compressed Inverted Files with Reduced Decoding Overheads
Vo Ngoc Anh
Department of Computer Science,
The University of Melbourne,
Parkville 3052, Australia.
Alistair Moffat
Department of Computer Science,
The University of Melbourne,
Parkville 3052, Australia.
Status
Proc. 21st Annual International ACM SIGIR Conference on
Research and Development in Information Retrieval,
Melbourne, Australia, August 1998, pages 290-297.
Abstract
Compressed inverted files are the most compact way of indexing
large text databases, typically occupying around 10% of the space of
the collection they index.
The drawback of compression is the need to decompress the index lists
during query processing.
Here we describe an improved implementation of compressed inverted
lists that eliminates almost all redundant
decoding and allows extremely fast processing of conjunctive Boolean
queries and ranked queries.
We also describe a pruning method to reduce the number of candidate
documents considered during the evaluation of ranked queries.
Experimental results with a database of 510~Mb show that the new
mechanism can reduce the CPU and elapsed time for Boolean queries
of 4--10 terms
to one tenth and one fifth respectively of the standard technique.
For ranked queries, the new mechanism reduces both CPU and elapsed time to one
third and memory usage to less than one tenth
of the standard algorithm,
with no degradation in retrieval effectiveness.