Improved Inverted File Processing for Large Text Databases

Alistair Moffat
Department of Computer Science, The University of Melbourne, Parkville 3052, Australia.

Justin Zobel
Department of Computer Science, RMIT, Melbourne 3001, Australia.

Shmuel T. Klein
Department of Mathematics and Computer Science, Bar-Ilan University, Ramat-Gan 52900, Israel.

Status

Presented at the 6th Australasian Database Conference, Adelaide, January 1995.

Abstract

Compressed inverted files provide a space-economical indexing method for full-text databases. For typical document collections, a document-level index requires a space overhead of just 5--10\%. Compression does mean that the cost of processing queries against such a database is dominated by the time required to decompress inverted lists. However, it has been demonstrated that inserting ``skipping'' information into the inverted lists substantially reduces the cost of answering typical queries, so much so that the compressed index provides faster query response than an equivalent uncompressed index. Here we describe an alternative ``blocking'' mechanism that further reduces the cost of query processing. Blocking has roughly the same space overhead as skipping, but provides faster query processing and has the advantage that the CPU cost---as distinct from the total cost, including the time for disk transfer---can be held low.