Effective Document Presentation with a Locality-Based Similarity Heuristic


Owen de Kretser
Department of Computer Science and Software Engineering, The University of Melbourne, Parkville 3052, Australia.

Alistair Moffat
Department of Computer Science and Software Engineering, The University of Melbourne, Parkville 3052, Australia.


Status

Proc. 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, San Francisco, August 1999, 113-120.

Abstract

The heuristics employed in information retrieval systems have traditionally been document-based, and have judged similarity holistically based upon entire documents. In this work we present a locality-based paradigm for information retrieval, in which every word location in each document is scored. The locality-based similarity heuristic provides retrieval effectiveness as good as the document-based technique, and has the additional advantage of allowing the matching section or sections of retrieved documents to be shown to the user when they are sifting the results of their query. This is a considerable improvement upon the conventional presentation mechanism, in which the user must manually search each document for the passage -- if any such passage exists at all -- that suggested to the retrieval mechanism that this document is an answer. We also describe an improved index representation that supports the required operations.