Improving Two-Stage Ad-Hoc Retrieval for Short Queries
K. L. Kwok
Computer Science Department, Queens College, CUNY, Flushing, NY 11367, USA.
M. Chan
Computer Science Department, Queens College, CUNY, Flushing, NY 11367, USA.
Abstract
Short queries in an ad-hoc retrieval environment are difficult but
unavoidable. We present several methods to try to improve our current
strategy of 2-stage pseudo-relevance feedback retrieval in such a
situation. They are: 1) avtf query term weighting, 2) variable high
frequency Zipfian threshold, 3) collection enrichment, 4) enhancing
term variety in raw queries, and 5) using retrieved document local
term statistics. Avtf employs collection statistics to weight terms in
short queries. Variable high frequency threshold defines and ignores
statistical stopwords based on query length. Collection enrichment
adds other collections to the one under investigation so as to improve
the chance of ranking more relevant documents in the top n for the
pseudo-feedback process. Enhancing term variety to raw queries tries to
find highly associated terms in a set of documents that is domain-related
to the query. Making the query longer may improve 1st stage
retrieval. And retrieved document local statistics re-weight terms in
the 2nd stage using the set of domain-related documents rather
than the whole collection as used during the initial stage. Experiments
were performed using the TREC 5 and 6 environment. It is found that
together these methods perform well for the difficult TREC-5 topics,
and also works for the TREC-6 very short topics.
SIGIR'98
24-28 August 1998
Melbourne, Australia.
sigir98@cs.mu.oz.au.