Frequent Subgraph-Based Approach for Classifying Vietnamese Text Documents
Autor: | Kiem Hoang, Tu Anh Hoang Nguyen |
---|---|
Rok vydání: | 2009 |
Předmět: | |
Zdroj: | Enterprise Information Systems ISBN: 9783642013461 ICEIS |
DOI: | 10.1007/978-3-642-01347-8_25 |
Popis: | In this paper we present a simple approach for Vietnamese text classification without word segmentation, based on frequent subgraph mining techniques. A graph-based instead of traditional vector-based model is used for document representation. The classification model employs structural patterns (subgraphs) and Dice measure of similarity to identify a class of documents. This method is evaluated on Vietnamese data set for measuring classification accuracy. Results show that it can outperform k-NN algorithm (based on vector, hybrid document representation) in terms of accuracy and classification time. |
Databáze: | OpenAIRE |
Externí odkaz: |