Learning a taxonomy from a set of text documents
Autor: | Alberto Pérez García-Plaza, Mari-Sanna Paukkeri, Víctor Fresno, Raquel Martínez Unanue, Timo Honkela |
---|---|
Rok vydání: | 2012 |
Předmět: |
Self-organizing map
Knowledge representation and reasoning business.industry Computer science Feature extraction Document clustering Machine learning computer.software_genre Fuzzy logic Weighting Taxonomy (general) Artificial intelligence business Cluster analysis computer Software Natural language processing |
Zdroj: | Applied Soft Computing. 12:1138-1148 |
ISSN: | 1568-4946 |
DOI: | 10.1016/j.asoc.2011.11.009 |
Popis: | We present a methodology for learning a taxonomy from a set of text documents that each describes one concept. The taxonomy is obtained by clustering the concept definition documents with a hierarchical approach to the Self-Organizing Map. In this study, we compare three different feature extraction approaches with varying degree of language independence. The feature extraction schemes include fuzzy logic-based feature weighting and selection, statistical keyphrase extraction, and the traditional tf-idf weighting scheme. The experiments are conducted for English, Finnish, and Spanish. The results show that while the rule-based fuzzy logic systems have an advantage in automatic taxonomy learning, taxonomies can also be constructed with tolerable results using statistical methods without domain- or style-specific knowledge. |
Databáze: | OpenAIRE |
Externí odkaz: |