Collection-Independent Document-Centric Impacts
Vo Ngoc Anh
Department of Computer Science and Software Engineering,
The University of Melbourne,
Victoria 3010, Australia.
Alistair Moffat
Department of Computer Science and Software Engineering,
The University of Melbourne,
Victoria 3010, Australia.
Status
Proc. Australian Document Computing Symposium,
Melbourne, December 13, 2004, pages 25-32.
Abstract
An information retrieval system employs a similarity heuristic to
estimate the probability that documents and queries match each other.
The heuristic is usually formulated in the context of a collection,
so that the relationship between each document and the collection
that contains it affects the scoring used to provide the ranked set
of answers in response to a query.
In this paper we continue our study of document-centric similarity
measures, but seek to eliminate the reliance on collection statistics
in setting the document-related components of the measure.
There is a direct implementation benefit of being able to do this --
it means that impact-sorted inverted indexes can be built with just a
single parse of the source text.