Random Access Compressed Inverted Files
Vo Ngoc Anh
Department of Computer Science,
The University of Melbourne,
Parkville 3052, Australia.
Alistair Moffat
Department of Computer Science,
The University of Melbourne,
Parkville 3052, Australia.
Status
Proc. 9th Australasian Database Conference,
Perth, Australia, February 1998, 1-12.
Abstract
Compressed inverted indexes are used for information retrieval in
full-text databases because of their relatively small size, about
6-10% of the source texts.
However, compressed inverted indexes allow only sequential decoding
within each inverted list,
whereas uncompressed lists can be searched in a random-access manner.
Here we reexamine the problem of adding random access to compressed
inverted lists.
By dividing a compressed inverted list into blocks of the same length
we achieve random access at block level; and then, by applying an
appropriate compression method within each block, we can have nearly
random access to any element of the list.
As a result, by spending some extra space for the index (but
still keeping it under 15% of the text being stored) we have an
implementation of compressed
inverted files that allows faster processing than previous variants,
yet is still straightforward to implement.
Experimental results with a database of 500~Mb show that
the new mechanism can reduce processing time for Boolean queries
of 5-10 terms to about 20% of the standard technique.
The model also promises improved performance for the evaluation
of ranked queries.