Text Categorization Based on Subtopic Clusters
Autor: | Robert W. P. Luk, Korris Fu-Lai Chung, Francis C. Y. Chik |
---|---|
Rok vydání: | 2005 |
Předmět: |
business.industry
Computer science InformationSystems_INFORMATIONSTORAGEANDRETRIEVAL Space (commercial competition) computer.software_genre Support vector machine ComputingMethodologies_PATTERNRECOGNITION Text categorization Cluster (physics) Information system Artificial intelligence business Cluster analysis computer Natural language processing |
Zdroj: | Natural Language Processing and Information Systems ISBN: 9783540260318 NLDB |
DOI: | 10.1007/11428817_19 |
Popis: | The distribution of the number of documents in topic classes is typically highly skewed. This leads to good micro-average performance but not so desirable macro-average performance. By viewing topics as clusters in a high dimensional space, we propose the use of clustering to determine subtopic clusters for large topic classes by assuming that large topic clusters are in general a mixture of a number of subtopic clusters. We used the Reuters News articles and support vector machines to evaluate whether using subtopic cluster can lead to better macro-average performance. |
Databáze: | OpenAIRE |
Externí odkaz: |