SIGIR'98 papers: Boosting and Rocchio Applied to Text Filtering
Boosting and Rocchio Applied to Text Filtering
Robert E. Schapire
AT&T Labs--Research, 180 Park Avenue, Florham Park, NJ
07932, USA.
Yoram Singer
AT&T Labs--Research, 180 Park Avenue, Florham Park, NJ
07932, USA.
Amit Singhal
AT&T Labs--Research, 180 Park Avenue, Florham Park, NJ
07932, USA.
Abstract
We discuss two learning algorithms for text filtering: modified
Rocchio and a boosting algorithm called AdaBoost. We show how both
algorithms can be adapted to maximize any general utility matrix that
associates cost (or gain) for each pair of machine prediction and
correct label. We first show that AdaBoost significantly outperforms
another highly effective text filtering algorithm. We then compare
AdaBoost and Rocchio over three large text filtering tasks. Overall
both algorithms are comparable and are quite effective. AdaBoost
produces better classifiers than Rocchio when the training collection
contains a very large number of relevant documents. However, on these
tasks, Rocchio runs much faster than AdaBoost.
SIGIR'98
24-28 August 1998
Melbourne, Australia.
sigir98@cs.mu.oz.au.