SIGIR'98 papers: Predicting the Performance of Linearly Combined IR Systems
Predicting the Performance of Linearly Combined IR Systems
Christopher C. Vogt
University of California, San Diego,
CSE 0114,
La Jolla, CA 92093, USA
Garrison W. Cottrell
University of California, San Diego,
CSE 0114,
La Jolla, CA 92093, USA
Abstract
We introduce a new technique for analyzing combination models. The
technique
allows us to make qualitative conclusions about which IR systems
should be
combined. We achieve this by using a linear regression to accurately
(r2=0.98) predict the
performance of the combined system based on
quantitative measurements of individual component systems taken from
TREC5. When applied to a
linear model (weighted sum of relevance scores), the technique
supports
several previously suggested hypotheses: one should maximize both the
individual
systems' performances and the overlap of
relevant documents between systems, while minimizing the overlap of
nonrelevant documents.
It also suggests new conclusions: both systems should distribute
scores
similarly, but not rank relevant documents similarly.
It furthermore suggests that the linear model is only able to
exploit a fraction of the benefit possible from combination.
The technique is general in nature and capable of
pointing out the strengths and weaknesses of any given combination
approach.
SIGIR'98
24-28 August 1998
Melbourne, Australia.
sigir98@cs.mu.oz.au.