Text Categorization Based on Subtopic Clusters

Autor: Robert W. P. Luk, Korris Fu-Lai Chung, Francis C. Y. Chik
Rok vydání: 2005
Předmět:
Zdroj: Natural Language Processing and Information Systems ISBN: 9783540260318
NLDB
DOI: 10.1007/11428817_19
Popis: The distribution of the number of documents in topic classes is typically highly skewed. This leads to good micro-average performance but not so desirable macro-average performance. By viewing topics as clusters in a high dimensional space, we propose the use of clustering to determine subtopic clusters for large topic classes by assuming that large topic clusters are in general a mixture of a number of subtopic clusters. We used the Reuters News articles and support vector machines to evaluate whether using subtopic cluster can lead to better macro-average performance.
Databáze: OpenAIRE