Locality-Based Information Retrieval


Owen de Kretser
Department of Computer Science and Software Engineering, The University of Melbourne, Parkville 3052, Australia.

Alistair Moffat
Department of Computer Science and Software Engineering, The University of Melbourne, Parkville 3052, Australia.


Status

Proc. 10th Australasian Database Conference, Auckland, January 1999, 177-188.

Abstract

Information retrieval mechanisms have largely been designed and tested using a document-based paradigm, in which the unit of retrieval is a document, and similarity is judged as being to a document. This emphasis is reinforced by the methodology employed by the TREC collaboration, in which systems are scored based upon document-level relevance judgements. In this paper we argue for a more seamless model for retrieval, in which the text is regarded as being continuous, the ``answers'' to a query are locations in the text where there is local similarity to the query, and similarity is assessed by a mechanism that employs as one of its parameters the distance between words. This paradigm has several advantages: it allows tightly focussed presentation of answers to the user of the system; it avoids the need for long texts to be segmented into artificial documents; and it obviates the need for document-length weighting factors. The drawback of the seamless approach is that more index information must be manipulated and querying requires more resources, but with the use of appropriate techniques these costs are manageable.