Improvement in Automatic Classification of Persian Documents by Means of Support Vector Machine and Representative Vector

Autor: Ezadi Hamed, Noohi Taher, Hossennejad Mihan, Jafari Ashkan
Rok vydání: 2011
Předmět:
Zdroj: Communications in Computer and Information Science ISBN: 9783642273360
DOI: 10.1007/978-3-642-27337-7_27
Popis: Representative Vector is a kind of Vector which includes related words and the degree of their relationships. In this paper the effect of using this kind of Vector on automatic classification of Persian documents is examined. In this method, preprocessed documents, extra words as well as word stems are at first found. Next, through one of the known ways, some features are extracted for each category. Then, the Representative Vector, which is made based on the elicited features, leads to some more detailed words which are better Representatives for each category. Findings of the experiments show that Precision and Recall can be increased significantly by extra words omission and addition of few words in the Representative Vectors as well as the use of a famous classification model like Support Vector Machine (SVM).
Databáze: OpenAIRE