SIGIR'98 papers: A flexible model for retrieval of SGML documents

A flexible model for retrieval of SGML documents


Sung Hyon Myaeng
Chungnam National University

Dong-Hyun Jang
Chungnam National University

Mun-Seok Kim
Systems Engineering Research Institute

Zong-Cheol Zhoo
Systems Engineering Research Institute


Abstract

In traditional information retrieval (IR) systems, a document as a whole is the target for a query. With increasing interests in structured documents like SGML documents, there is a growing need to build an IR system that can retrieve parts of documents, which satisfy not only content-based but also structure-based requirements. In this paper, we describe an inference-net-based approach to this problem. The model is capable of retrieving elements at any level in a principled way, satisfying certain containment constraints in a query. Moreover, while the model is general enough to reproduce the ranking strategy adopted by conventional document retrieval systems by making use of document and collection level statistics such as TF and IDF, its flexibility allows for incorporation of a variety of pragmatic and semantic information associated with document structures. We implemented the model and ran a series of experiments to show that, in addition to the added functionality, the use of the structural information embedded in SGML documents can improve the effectiveness of document retrieval, compared to the case where no such information is used. We also show that giving a pragmatic preference to a certain element type of the SGML documents can enhance retrieval effectiveness.


SIGIR'98
24-28 August 1998
Melbourne, Australia.
sigir98@cs.mu.oz.au.