Frequent Subgraph-Based Approach for Classifying Vietnamese Text Documents

Autor:	Kiem Hoang, Tu Anh Hoang Nguyen
Rok vydání:	2009
Předmět:	business.industry Computer science Vietnamese Text segmentation Text graph Dice Pattern recognition Document representation computer.software_genre language.human_language Text mining language Graph (abstract data type) Artificial intelligence business computer Natural language processing
Zdroj:	Enterprise Information Systems ISBN: 9783642013461 ICEIS
DOI:	10.1007/978-3-642-01347-8_25
Popis:	In this paper we present a simple approach for Vietnamese text classification without word segmentation, based on frequent subgraph mining techniques. A graph-based instead of traditional vector-based model is used for document representation. The classification model employs structural patterns (subgraphs) and Dice measure of similarity to identify a class of documents. This method is evaluated on Vietnamese data set for measuring classification accuracy. Results show that it can outperform k-NN algorithm (based on vector, hybrid document representation) in terms of accuracy and classification time.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::80b014da8be6197a8492ddd5db049dbc https://doi.org/10.1007/978-3-642-01347-8_25 Zobrazit plný text záznamu