SIGIR'98 papers: Discovering Typical Structures of Documents: A Road Map
Approach
Discovering Typical Structures of Documents: A Road Map Approach
Ke Wang
Department of Information Systems and Computer Science,
National University of Singapore,
Singapore, 119260
Huiqing Liu
BioInformatics Center,
National University of Singapore,
Singpaore, 119260
Abstract
The structure of a document refers
to the role and hierarchy of subdocument references.
Many on-line documents are
similarly structured, though not identically structured.
We study the problem of discovering "typical" structures of a
collection of such documents, where the user specifies
the minimum frequency of a typical structure.
We will consider structural features of subdocument references
such as labeling, nesting, ordering, cyclicity, and wild-card
references,
like those found on the Web and digital libraries.
Typical structures can be used to serve the
following purposes.
(a) The table-of-content for
gaining the general information of a source.
(b) A road map for browsing and querying a source. (c)
A basis for clustering documents. (d) Partial schemas for building
structured layers to provide standard database
access methods. (e) User/customer's interests
and browsing patterns.
We present a solution to the discovery problem.
SIGIR'98
24-28 August 1998
Melbourne, Australia.
sigir98@cs.mu.oz.au.