SIGIR'98 papers: Compressed Inverted Files with Reduced Decoding Overheads
Compressed Inverted Files with Reduced Decoding Overheads
Vo Ngoc Anh
Department of Computer Science,
The University of Melbourne,
Parkville 3052, Australia.
Alistair Moffat
Department of Computer Science,
The University of Melbourne,
Parkville 3052, Australia.
Abstract
Compressed inverted files are the most compact way of indexing large
text databases, typically occupying around 10% of the space of the
collection they index.
The drawback of compression is the need to decompress the index lists
during query processing.
Here we describe an improved implementation of compressed inverted
lists that eliminates almost all redundant decoding and allows
extremely fast processing of conjunctive Boolean queries and ranked
queries.
We also describe a pruning method to reduce the number of candidate
documents considered during the evaluation of ranked queries.
Experimental results with a database of 510 Mb show that the new
mechanism can reduce the CPU and elapsed time for Boolean queries of
4-10 terms to one tenth and one fifth respectively of the standard
technique.
For ranked queries, the new mechanism reduces both CPU and elapsed
time to one third and memory usage to less than one tenth of the
standard algorithm, with no degradation in retrieval effectiveness.
SIGIR'98
24-28 August 1998
Melbourne, Australia.
sigir98@cs.mu.oz.au.