Improved Inverted File Processing for Large Text Databases
Alistair Moffat
Department of Computer Science,
The University of Melbourne,
Parkville 3052, Australia.
Justin Zobel
Department of Computer Science,
RMIT,
Melbourne 3001, Australia.
Shmuel T. Klein
Department of Mathematics and Computer Science,
Bar-Ilan University,
Ramat-Gan 52900, Israel.
Status
Presented at the 6th Australasian Database Conference,
Adelaide, January 1995.
Abstract
Compressed inverted files provide a space-economical indexing method
for full-text databases. For typical document collections, a
document-level index requires a space overhead of just 5--10\%.
Compression does mean that the cost of processing queries against such
a database is dominated by the time required to decompress inverted
lists. However, it has been demonstrated that inserting ``skipping''
information into the inverted lists substantially reduces the cost of
answering typical queries, so much so that the compressed index
provides faster query response than an equivalent uncompressed index.
Here we describe an alternative ``blocking'' mechanism that further
reduces the cost of query processing. Blocking has roughly the same
space overhead as skipping, but provides faster query processing and
has the advantage that the CPU cost---as distinct from the total cost,
including the time for disk transfer---can be held low.