SIGIR'98 posters: Modern Classical Document Indexing

Modern Classical Document Indexing - a linguistic contribution to knowledge-based IR


Bas van Bakel
Vossius Laboratory for Content Engineering,
University of Twente,
P.O. Box 217, NL 7500 AE Enschede, the Netherlands.


Abstract

This poster presents Condorcet, a domain-specific prototype indexing system for tens of thousands of documents covering two scientific domains: engineering ceramics and epilepsy. The development corpus consists of 300 documents taken from machine-readable one year volumes of two scientific journals: the 1988 volume of Excerpta Medica from Elsevier Science Publishers, and the 1990 volume of Engineered Materials Abstracts from Materials Information. The Condorcet research project is funded by the Dutch Technology Foundation.

Condorcet takes a controlled-term approach to indexing: a document is indexed by mapping its title and abstract to concepts and relations from a given domain, defined in modern versions of classical thesauri, i.e. structured ontologies. Concepts rather than natural language terms are used as indexes. The claim is that using structured concepts as index terms will lead to a higher precision, as they are language-independent and non-ambiguous. For example, simple concepts like aspirin and headache point to all documents in which these two concepts are mentioned. However, by using structured concepts we can distinguish documents discussing aspirin as a cause of headache (causes[aspirin,headache]) from documents on aspirin as a cure for headache (cures[aspirin,headache]). Searching for the former group will exclude the latter, which is considerably more difficult - if not impossible - when documents are indexed with simple, unstructured c! ! oncepts.

The index process makes intensive use of linguistic knowledge, i.e. Chomsky's Government & Binding theory. The poster will focus on the linguistic principles that form the conceptual basis of the index process.


SIGIR'98
24-28 August 1998
Melbourne, Australia.
sigir98@cs.mu.oz.au.