Vector Space Ranking: Can We Keep it Simple?


Vo Ngoc Anh
Department of Computer Science and Software Engineering, The University of Melbourne, Victoria 3010, Australia.

Alistair Moffat
Department of Computer Science and Software Engineering, The University of Melbourne, Victoria 3010, Australia.


Status

Proc. Australian Document Computing Symposium, Sydney, December 16, 2002, pages 7-12.

Abstract

The vector-space model is used widely for document retrieval, based upon the TF-IDF rule for calculating similarity scores between a set of documents and a query. One of the drawbacks of this approach is the need to select a specific formulation for the similarity computation. Here we present an initial attempt to simplify the heuristic, by hiding the various detailed calculations, and evaluating the term importance qualitatively rather than quantitatively. A new technique, called local reordering is introduced. Local reordering still relies on the vector-space model, as it employs a scalar vector product for calculating similarity scores. But there is no longer a requirement for precise values of the document or query vectors to be determined. Initial experiments on two data sets shows that it is highly competitive in terms of retrieval effectiveness. As a useful side effect, the method allows extremely fast query processing.