Bas van Bakel
Vossius Laboratory for Content Engineering,
University of Twente,
P.O. Box 217, NL 7500 AE Enschede, the Netherlands.
Condorcet takes a controlled-term approach to indexing: a document is indexed by mapping its title and abstract to concepts and relations from a given domain, defined in modern versions of classical thesauri, i.e. structured ontologies. Concepts rather than natural language terms are used as indexes. The claim is that using structured concepts as index terms will lead to a higher precision, as they are language-independent and non-ambiguous. For example, simple concepts like aspirin and headache point to all documents in which these two concepts are mentioned. However, by using structured concepts we can distinguish documents discussing aspirin as a cause of headache (causes[aspirin,headache]) from documents on aspirin as a cure for headache (cures[aspirin,headache]). Searching for the former group will exclude the latter, which is considerably more difficult - if not impossible - when documents are indexed with simple, unstructured c! ! oncepts.
The index process makes intensive use of linguistic knowledge, i.e. Chomsky's Government & Binding theory. The poster will focus on the linguistic principles that form the conceptual basis of the index process.
SIGIR'98