Statistical Phrases for Vector-Space Information Retrieval
Andrew Turpin
Department of Computer Science and Software Engineering,
The University of Melbourne,
Parkville 3052, Australia.
Alistair Moffat
Department of Computer Science and Software Engineering,
The University of Melbourne,
Parkville 3052, Australia.
Status
Proc. 22nd Annual International ACM SIGIR Conference on
Research and Development in Information Retrieval,
San Francisco, August 1999, 309-310.
Abstract
When employing a vector-space model to evaluate a query against a
document collection several choices must be made.
A fundamental design decision is the definition of the terms which form
the dimensions of the space.
Should the terms be single words, pairs of words, linguistic phrases,
entire sentences, or some other combination of textual units?
It seems intuitive that when calculating a measure of similarity
between a natural language query text and natural language documents,
some respect should be paid to word ordering.
Complex terms such as phrases should, therefore, increase the
precision of retrieval results.
Recent work has, however, shown that this is not the case.
In this abstract we describe experiments
that further confirm that observation.
Note that we are solely concerned with statistical phrases;
that is, phrases derived using techniques other than NLP.