New Techniques for Open-Vocabulary Spoken Document Retrieval

New Techniques for Open-Vocabulary Spoken Document Retrieval


Martin Wechsler
Eugen Munteanu
Peter Schäuble
Swiss Federal Institute of Technollogy (ETH), Zürich, Switzerland


Abstract

This paper presents four novel techniques for open-vocabulary spoken document retrieval: a method to detect slots that possibly contain a query feature; a method to estimate occurrence probabilities; a technique that we call collection-wide probability re-estimation and a weighting scheme which takes advantage of the fact that long query features are detected more reliably. These four techniques have been evaluated using the TREC-6 spoken document retrieval test collection to determine the improvements in retrieval effectiveness with respect to a baseline retrieval method. Results show that the retrieval effectiveness can be improved considerably despite the large number of speech recognition errors.


SIGIR'98
24-28 August 1998
Melbourne, Australia.
sigir98@cs.mu.oz.au.