WORDS AS CLASSIFIERS OF DOCUMENTS ACCORDING TO THEIR HISTORICAL PERIOD AND THE ETHNIC ORIGIN OF THEIR AUTHORS

Autor: Dror Mughaz, Elchai Yehudai, Yaakov HaCohen-Kerner, Hananya Beck
Rok vydání: 2008
Předmět:
Zdroj: Cybernetics and Systems. 39:213-228
ISSN: 1087-6553
0196-9722
DOI: 10.1080/01969720801944299
Popis: Text classification presents challenges due to the large number of features, their dependencies, and the large number of training documents. In this research, we investigate whether the use of words as features is appropriate for classification of documents to the ethnic group of their authors and/or to the historical period when they were written. To the best of our knowledge, these kinds of classifications have not been explored before by others. In addition, we investigate Forman's (2003) claim about not using common words for classification tasks. The application domain was articles referring to Jewish law written in Hebrew-Aramaic, which have been little studied. Different experiments using SVM and InfoGain present highly successful results (more than 95%). The results indicate that the use of common words as features contribute to make the learning task efficient and more accurate.
Databáze: OpenAIRE