Automatic text categorization of marathi documents using clustering technique.

Autor: Vispute, Sushma R., Potey, M. A.
Zdroj: 2013 15th International Conference on Advanced Computing Technologies (ICACT); 2013, p1-5, 5p
Abstrakt: The purpose of the present work is creating an intelligent system to retrieve desired documents in Marathi language. The system also focuses on providing the personalized documents in Marathi language to the end user based on their interests identified from the browsing history. This paper presents the automatic categorization of Marathi documents and the literature survey of the related work done in automatic categorization of text documents. Several supervised learning techniques are exists for the classification of text documents namely Decision trees, Support Vector machine (SVM), Neural Network, Ada Boost and Naïve Bayes etc. Several clustering techniques are also available for text categorization namely K-means, Suffix Tree Clustering (STC), Semantic Online Hierarchical Clustering (SHOC), Label Induction Grouping Algorithm (LINGO) etc. In the literature survey it is found that vector space model (VSM) gives better result than probabilistic model. This paper presents categorization of the Marathi text documents using Lingo Clustering algorithm based on VSM. The data set consists of 107 Marathi documents of 3 different categories-Tourism, Health Programmes and Maharashtra festivals. The result shows that the performance of the LINGO clustering algorithm is good for categorizing the Marathi text documents. For the Marathi documents overall accuracy of the system is 91.10%. [ABSTRACT FROM PUBLISHER]
Databáze: Complementary Index