Structured Index Organizations for High-Throughput Text Querying


Vo Ngoc Anh
Department of Computer Science and Software Engineering, The University of Melbourne, Victoria 3010, Australia.

Alistair Moffat
Department of Computer Science and Software Engineering, The University of Melbourne, Victoria 3010, Australia.


Status

Proc. 13th Int. Symp. String Processing and Information Retrieval, Glasgow, Scotland, October 2006, pages 304-315. LNCS volume 4209.

Abstract

Inverted indexes are the preferred mechanism for supporting content-based queries in text retrieval systems, with the various data items usually stored compressed in some way. But different query modalities require that different information be held in the index. For example, phrase querying requires that word offsets be held as well as document numbers. In this study we describe an inverted index organization that provides efficient support for all of conjunctive Boolean queries, ranked queries, and phrase queries. Experimental results on a 426 GB document collection show that the methods we describe provide fast evaluation of all three querying modes.

Full text

http://dx.doi.org/10.1007/11880561_25