SIGIR'98 papers: Improving Automatic Query Expansion

Improving Automatic Query Expansion


Mandar Mitra
Cornell University, Ithaca, NY 14853.

Amit Singhal
AT&T Labs--Research, Florham Park, NJ 07932.

Chris Buckley
Sabir Research Inc., Gaithersburg, MD 20878.


Abstract

Most casual users of IR systems type short queries. Recent research has shown that adding new words to these queries via adhoc feedback improves the retrieval effectiveness of such queries. We investigate ways to improve this query expansion process by refining the set of documents used in feedback. We start by using manually formulated Boolean filters along with proximity constraints. Our approach is similar to the one proposed by Hearst [HEARST96]. Next, we investigate a completely automatic method that makes use of term cooccurrence information to estimate word correlation. Experimental results show that refining the set of documents used in query expansion often prevents the query drift caused by blind expansion and yields substantial improvements in retrieval effectiveness, both in terms of average precision and precision in the top twenty documents. More importantly, the fully automatic approach developed in this study performs competitively with the best manual approach and requires little computational overhead.


SIGIR'98
24-28 August 1998
Melbourne, Australia.
sigir98@cs.mu.oz.au.