Fast On-Line Index Construction by Geometric Partitioning
Nicholas Lester
School of Computer Science and Information Technology,
RMIT University,
Victoria 3001, Australia.
Alistair Moffat
Department of Computer Science and Software Engineering,
The University of Melbourne,
Victoria 3010, Australia.
Justin Zobel
School of Computer Science and Information Technology,
RMIT University,
Victoria 3001, Australia.
Status
Proc. 2005 ACM CIKM Int. Conf. Information and
Knowledge Management, Bremen, Germany, November 2005,
pages 776-783.
Abstract
Inverted index structures are the mainstay of modern text retrieval
systems.
They can be constructed quickly using off-line merge-based methods,
and provide efficient support for a variety of querying modes.
In this paper we examine the task of on-line index
construction -- that is, how to build an inverted index when the
underlying data must be continuously queryable, and the documents
must be indexed and available for search as soon they are inserted.
When straightforward approaches are used, document insertions become
increasingly expensive as the size of the database grows.
This paper describes a mechanism based on controlled partitioning
that can be adapted to suit different balances of insertion and
querying operations, and is faster and scales better than previous
methods.
Using experiments on 100 GB of web data we demonstrate the
efficiency of our methods in practice, showing that they dramatically
reduce the cost of on-line index construction.