Inverted Files Versus Signature Files for Text Indexing
Justin Zobel
Department of Computer Science,
RMIT, GPO Box 2476V,
Melbourne 3001, Australia.
Alistair Moffat
Department of Computer Science,
The University of Melbourne,
Parkville 3052, Australia.
Kotagiri Ramamohanarao
Department of Computer Science,
The University of Melbourne,
Parkville 3052, Australia.
Status
ACM Trans. Database Systems, 23(4):453-490, December 1998.
Abstract
Two well-known indexing methods are inverted files and signature
files.
We have undertaken a detailed comparison of these two approaches in
the context of text indexing, paying particular attention to query
evaluation speed and space requirements.
We have examined their relative
performance using both experimentation and a
refined approach to modelling of signature files, and demonstrate
that inverted files are distinctly
superior to signature files.
Not only can inverted files be used to evaluate typical queries in
less time than can signature files, but inverted files require less
space and provide greater functionality.
Our results also show that a synthetic text database
can provide a realistic indication of the
behaviour of an actual text database.
The tools used to generate the synthetic database have
been made publicly available.
Full text
http://doi.acm.org/10.1145/296854.277632
.