SIGIR'98 Demonstrations: Cheshire II: Combining Probabilistic and Boolean Retrieval

Cheshire II: Combining Probabilistic and Boolean Retrieval


Ray R. Larson
School of Information Management and Systems University of California, Berkeley


Abstract

The Cheshire II system was originally designed to apply probabilistic retrieval methods to online library catalog searches in order to help overcome the pervasive twin problems of topical searching of Boolean online catalogs: search failure and information overload\cite{larson96b}. It was intended to be a next-generation online catalog and full-text information retrieval system that would apply probabilistic retrieval methods to simple MARC records and clustered record surrogates. Over time the system has been explanded to include support for full-text SGML documents (ranging from simple document types as used in the TREC database \cite{larson97a} to complex full-text document encoded using the TEI and EAD DTDs) and support for full-text OCR from scanned page image files linked to SGML bibliographic records. The system is the primary text search engine for the UC Berkeley Environmental Digital Library project sponsored by NSF, NASA, and ARPA. It also provides access to a number of diverse databases databases via the WWW using an HTTP to Z39.50 gateway. It has also been adopted for use as a search engine in working library environments including the Physical Sciences Libraries at UC Berkeley, The Data Archives at the University of Essex, and the special collection department of the University of Liverpool Library.


SIGIR'98
24-28 August 1998
Melbourne, Australia.
sigir98@cs.mu.oz.au.