SIGIR'98 papers: Extracting Classification Knowledge of Internet Documents with Mining Term Associations: A Semantic Approach

Extracting Classification Knowledge of Internet Documents with Mining Term Associations: A Semantic Approach

Shian-Hua Lin
Department of Engineering Science, National Cheng Kung University, Tainan, Taiwan

Chi-Sheng Shih
Institute of Information Science, Academia Sinica, Taipei, Taiwan

Meng Chang Chen
Institute of Information Science, Academia Sinica, Taipei, Taiwan

Jan-Ming Ho
Institute of Information Science, Academia Sinica, Taipei, Taiwan

Ming-Tat Ko
Institute of Information Science, Academia Sinica, Taipei, Taiwan

Yueh-Ming Huang
Department of Engineering Science, National Cheng Kung University, Tainan, Taiwan

Abstract

In this paper, we present a system that extracts and generalizes terms from Internet documents to represent classification knowledge of a given class hierarchy. We propose a measurement to evaluate the importance of a term with respect to a class in the class hierarchy, and denote it as support. With a given threshold, terms with high supports are sifted as keywords of a class, and terms with low supports are filtered out. To further enhance the recall of this approach, Mining Association Rules technique is applied to mine the association between terms. An inference model is composed of these association relations and the previously computed supports of the terms in the class. To increase the recall rate of the keyword selection process, we then present a polynomial-time inference algorithm to promote a term, strongly associated to a known keyword, to a keyword. According to our experiment results on the collected Internet documents from Yam search engine, we show that the proposed methods in the paper contribute to refine the classification knowledge and increase the recall of keyword selection.

SIGIR'98
24-28 August 1998
Melbourne, Australia.
sigir98@cs.mu.oz.au.