Efficient Online Index Construction for Text Databases


Nicholas Lester
School of Computer Science and Information Technology, RMIT University, Victoria 3001, Australia.

Alistair Moffat
Department of Computer Science and Software Engineering, The University of Melbourne, Victoria 3010, Australia.

Justin Zobel
School of Computer Science and Information Technology, RMIT University, Victoria 3001, Australia.


Status

ACM Trans. Database Systems, 22(3):19.1-19.33, August 2008. Parts of this paper appeared in preliminary form in the 2005 ACM CIKM Int. Conf. Information and Knowledge Management, Bremen, Germany, November 2005, pages 776-783.

Abstract

Inverted index structures are a core element of current text retrieval systems. They can be constructed quickly using offline approaches, in which one or more passes are made over a static set of input data, and, at the completion of the process, an index is available for querying. However, there are search environments in which even a small delay in timeliness cannot be tolerated, and the index must always be queryable and up to date. Here we describe and analyze a geometric partitioning mechanism for online index construction that provides a range of tradeoffs between costs, and can be adapted to different balances of insertion and querying operations. Detailed experimental results are provided that show the extent of these tradeoffs, and that these new methods can yield substantial savings in online indexing costs.

Full text

http://doi.acm.org/10.1145/1386118.1386125 .