Incremental clustering with GA, SVM, and FCM methods

Autor: yu-hong dai, 戴宇宏
Rok vydání: 2008
Druh dokumentu: 學位論文 ; thesis
Popis: 96
With explosion of information, it is very difficult to manage documents. How to efficiently find useful information in large information is very important. Clustering algorithm is a kind of technology to find characteristics of information and relationship to help manage documents. This study proposes a method--combination of SVM classification method and fuzzy clustering method based on genetic algorithm. SVM classification method based on genetic algorithm is used to classify incoming document to see if it belongs to the existing classes. Fuzzy clustering method based on genetic algorithm is used to cluster the unclassified documents. First, we use CKIP system to segment Chinese documents to extract keywords. Genetic algorithms is used to select the appropriate terms to train SVM model of existing classes and classify incoming document to see if it belongs to the existing classes. Then genetic algorithm is used again to select the best number of clustering and the best centroid of cluster. Finally, precision, recall and F-measure are used to measure the efficiency. Macro-average and Micro-average are used to measure accuracy. In empirical results, the proposed method can improve classification effectiveness. Also, GA-FCM outperforms other clustering methods significantly.
Databáze: Networked Digital Library of Theses & Dissertations