Simplified Similarity Scoring Using Term Ranks
Vo Ngoc Anh
Department of Computer Science and Software Engineering,
The University of Melbourne,
Victoria 3010, Australia.
Alistair Moffat
Department of Computer Science and Software Engineering,
The University of Melbourne,
Victoria 3010, Australia.
Status
Proc. 28th Annual International ACM SIGIR Conference on
Research and Development in Information Retrieval,
Salvador, Brazil, August 2005, pages 226-233.
Abstract
We propose a method for document ranking that combines a simple
document-centric view of text, and fast evaluation strategies that
have been developed in connection with the vector space model.
The new method defines the importance of a term within a document
qualitatively rather than quantitatively, and in doing so reduces
the need for tuning parameters.
In addition, the method supports very fast query processing, with
most of the computation carried out on small integers, and dynamic
pruning an effective option.
Experiments on a wide range of TREC data show that the new method
provides retrieval effectiveness as good as or better than the
Okapi BM25 formulation, and variants of language models.