WORDS AS CLASSIFIERS OF DOCUMENTS ACCORDING TO THEIR HISTORICAL PERIOD AND THE ETHNIC ORIGIN OF THEIR AUTHORS
Autor: | Dror Mughaz, Elchai Yehudai, Yaakov HaCohen-Kerner, Hananya Beck |
---|---|
Rok vydání: | 2008 |
Předmět: |
business.industry
Computer science Speech recognition Ethnic group Ethnic origin computer.software_genre Task (project management) Support vector machine Artificial Intelligence Application domain Artificial intelligence business computer Software Natural language processing Period (music) Information Systems |
Zdroj: | Cybernetics and Systems. 39:213-228 |
ISSN: | 1087-6553 0196-9722 |
DOI: | 10.1080/01969720801944299 |
Popis: | Text classification presents challenges due to the large number of features, their dependencies, and the large number of training documents. In this research, we investigate whether the use of words as features is appropriate for classification of documents to the ethnic group of their authors and/or to the historical period when they were written. To the best of our knowledge, these kinds of classifications have not been explored before by others. In addition, we investigate Forman's (2003) claim about not using common words for classification tasks. The application domain was articles referring to Jewish law written in Hebrew-Aramaic, which have been little studied. Different experiments using SVM and InfoGain present highly successful results (more than 95%). The results indicate that the use of common words as features contribute to make the learning task efficient and more accurate. |
Databáze: | OpenAIRE |
Externí odkaz: |