SIGIR'98 papers: Predicting the Performance of Linearly Combined IR Systems

Predicting the Performance of Linearly Combined IR Systems


Christopher C. Vogt
University of California, San Diego, CSE 0114, La Jolla, CA 92093, USA

Garrison W. Cottrell
University of California, San Diego, CSE 0114, La Jolla, CA 92093, USA


Abstract

We introduce a new technique for analyzing combination models. The technique allows us to make qualitative conclusions about which IR systems should be combined. We achieve this by using a linear regression to accurately (r2=0.98) predict the performance of the combined system based on quantitative measurements of individual component systems taken from TREC5. When applied to a linear model (weighted sum of relevance scores), the technique supports several previously suggested hypotheses: one should maximize both the individual systems' performances and the overlap of relevant documents between systems, while minimizing the overlap of nonrelevant documents. It also suggests new conclusions: both systems should distribute scores similarly, but not rank relevant documents similarly. It furthermore suggests that the linear model is only able to exploit a fraction of the benefit possible from combination. The technique is general in nature and capable of pointing out the strengths and weaknesses of any given combination approach.


SIGIR'98
24-28 August 1998
Melbourne, Australia.
sigir98@cs.mu.oz.au.