SIGIR'98 papers: Multilingual Keyword Extraction for Term Suggestion

Multilingual Keyword Extraction for Term Suggestion


Yuen-Hsien Tseng
Department of Library & Information Science, Fu Jen Catholic University, Taipei, Taiwan, R.O.C.


Abstract

An efficient keyword extraction algorithm applicable to documents in any languages is presented. Because documents concentrating on a topic tend to mention a set of words in a specific sequence repeatedly, the approach assumes that keywords are repeated string patterns. The proposed algorithm has some distinct features: it requires no extra resources such as lexicons, corpora, or NLP parsers; the time and space complexity are linear in average case; the threshold, the only parameter in this algorithm, is easily tuned; keywords of any length can be identified; when used in character level, single words or word stems can be identified as well as multiple-word phrases. The extracted keywords are suitable for perusal and selection in term suggestion application. Experimental results for some English and Chinese documents are shown. Applications to bibliographic data of over 320000 records and full text of 13000 news abstracts for term suggestion are presented.


SIGIR'98
24-28 August 1998
Melbourne, Australia.
sigir98@cs.mu.oz.au.