Inverted Files Versus Signature Files for Text Indexing


Justin Zobel
Department of Computer Science, RMIT, GPO Box 2476V, Melbourne 3001, Australia.

Alistair Moffat
Department of Computer Science, The University of Melbourne, Parkville 3052, Australia.

Kotagiri Ramamohanarao
Department of Computer Science, The University of Melbourne, Parkville 3052, Australia.


Status

ACM Trans. Database Systems, 23(4):453-490, December 1998.

Abstract

Two well-known indexing methods are inverted files and signature files. We have undertaken a detailed comparison of these two approaches in the context of text indexing, paying particular attention to query evaluation speed and space requirements. We have examined their relative performance using both experimentation and a refined approach to modelling of signature files, and demonstrate that inverted files are distinctly superior to signature files. Not only can inverted files be used to evaluate typical queries in less time than can signature files, but inverted files require less space and provide greater functionality. Our results also show that a synthetic text database can provide a realistic indication of the behaviour of an actual text database. The tools used to generate the synthetic database have been made publicly available.

Full text

http://doi.acm.org/10.1145/296854.277632 .