Frequent Subgraph-Based Approach for Classifying Vietnamese Text Documents

Autor: Kiem Hoang, Tu Anh Hoang Nguyen
Rok vydání: 2009
Předmět:
Zdroj: Enterprise Information Systems ISBN: 9783642013461
ICEIS
DOI: 10.1007/978-3-642-01347-8_25
Popis: In this paper we present a simple approach for Vietnamese text classification without word segmentation, based on frequent subgraph mining techniques. A graph-based instead of traditional vector-based model is used for document representation. The classification model employs structural patterns (subgraphs) and Dice measure of similarity to identify a class of documents. This method is evaluated on Vietnamese data set for measuring classification accuracy. Results show that it can outperform k-NN algorithm (based on vector, hybrid document representation) in terms of accuracy and classification time.
Databáze: OpenAIRE