Random Access Compressed Inverted Files


Vo Ngoc Anh
Department of Computer Science, The University of Melbourne, Parkville 3052, Australia.

Alistair Moffat
Department of Computer Science, The University of Melbourne, Parkville 3052, Australia.


Status

Proc. 9th Australasian Database Conference, Perth, Australia, February 1998, 1-12.

Abstract

Compressed inverted indexes are used for information retrieval in full-text databases because of their relatively small size, about 6-10% of the source texts. However, compressed inverted indexes allow only sequential decoding within each inverted list, whereas uncompressed lists can be searched in a random-access manner. Here we reexamine the problem of adding random access to compressed inverted lists. By dividing a compressed inverted list into blocks of the same length we achieve random access at block level; and then, by applying an appropriate compression method within each block, we can have nearly random access to any element of the list. As a result, by spending some extra space for the index (but still keeping it under 15% of the text being stored) we have an implementation of compressed inverted files that allows faster processing than previous variants, yet is still straightforward to implement. Experimental results with a database of 500~Mb show that the new mechanism can reduce processing time for Boolean queries of 5-10 terms to about 20% of the standard technique. The model also promises improved performance for the evaluation of ranked queries.