Locality-Based Information Retrieval
Owen de Kretser
Department of Computer Science and Software Engineering,
The University of Melbourne,
Parkville 3052, Australia.
Alistair Moffat
Department of Computer Science and Software Engineering,
The University of Melbourne,
Parkville 3052, Australia.
Status
Proc. 10th Australasian Database Conference, Auckland, January 1999,
177-188.
Abstract
Information retrieval mechanisms have largely been
designed and tested using a document-based paradigm, in which the
unit of retrieval is a document, and similarity is judged as being to
a document.
This emphasis is reinforced by the methodology employed by the
TREC collaboration, in which systems are scored based upon
document-level relevance judgements.
In this paper we argue for a more seamless model for retrieval, in
which the text is regarded as being continuous, the ``answers''
to a query are locations in the text where there is local similarity
to the query, and similarity is assessed by a mechanism that employs
as one of its parameters the distance between words.
This paradigm has several advantages: it allows tightly focussed
presentation of answers to the user of the system;
it avoids the
need for long texts to be segmented into artificial documents; and it
obviates the need for document-length weighting factors.
The drawback of the seamless approach is that more index information
must be manipulated and querying requires more resources,
but with the use of appropriate
techniques these costs are manageable.