SIGIR'98 Demonstrations: Cheshire II: Combining Probabilistic and Boolean Retrieval
Cheshire II: Combining Probabilistic and Boolean Retrieval
Ray R. Larson
School of Information Management and Systems
University of California, Berkeley
Abstract
The Cheshire II system was originally designed to apply probabilistic
retrieval methods to online library catalog searches in order to
help overcome the pervasive twin problems of topical searching
of Boolean online catalogs: search
failure and information overload\cite{larson96b}.
It was intended to be a
next-generation online catalog and full-text information retrieval
system that would apply probabilistic retrieval methods to simple MARC
records and clustered record surrogates. Over time the system has been
explanded to include support for full-text SGML documents (ranging
from simple document types as used in the TREC database \cite{larson97a}
to complex
full-text document encoded using the TEI and EAD DTDs) and support for
full-text OCR from scanned page image files linked to SGML
bibliographic records.
The system is the primary text search
engine for the UC Berkeley Environmental Digital Library project
sponsored by NSF, NASA, and ARPA. It also provides access to a
number of diverse databases databases via the WWW using an HTTP to
Z39.50 gateway. It has also been adopted for use as a search engine
in working library environments including the Physical Sciences Libraries
at UC Berkeley, The Data Archives at the University of Essex, and
the special collection department of the University of Liverpool Library.
SIGIR'98
24-28 August 1998
Melbourne, Australia.
sigir98@cs.mu.oz.au.