SIGIR'98 papers:
A flexible model for retrieval of SGML documents
A flexible model for retrieval of SGML documents
Sung Hyon Myaeng
Chungnam National University
Dong-Hyun Jang
Chungnam National University
Mun-Seok Kim
Systems Engineering Research Institute
Zong-Cheol Zhoo
Systems Engineering Research Institute
Abstract
In traditional information retrieval (IR) systems, a document as a whole
is the target for a query. With increasing interests in structured
documents like SGML documents, there is a growing need to build an IR
system that can retrieve parts of documents, which satisfy not only
content-based but also structure-based requirements. In this paper, we
describe an inference-net-based approach to this problem. The model is
capable of retrieving elements at any level in a principled way,
satisfying certain containment constraints in a query. Moreover, while
the model is general enough to reproduce the ranking strategy adopted by
conventional document retrieval systems by making use of document and
collection level statistics such as TF and IDF, its flexibility allows
for incorporation of a variety of pragmatic and semantic information
associated with document structures. We implemented the model and ran a
series of experiments to show that, in addition to the added
functionality, the use of the structural information embedded in SGML
documents can improve the effectiveness of document retrieval, compared
to the case where no such information is used. We also show that giving
a pragmatic preference to a certain element type of the SGML documents
can enhance retrieval effectiveness.
SIGIR'98
24-28 August 1998
Melbourne, Australia.
sigir98@cs.mu.oz.au.