Arabic Light Stemmer Based on ISRI Stemmer
Autor: | Khudhair Abed Thamer, Dhafar Hamed Abd, Abir Hussain, Wasiq Khan |
---|---|
Rok vydání: | 2021 |
Předmět: |
Arabic
Computer science business.industry Process (engineering) media_common.quotation_subject Ambiguity computer.software_genre Information science language.human_language Task (project management) ComputingMethodologies_DOCUMENTANDTEXTPROCESSING language Artificial intelligence business computer Natural language processing media_common |
Zdroj: | Intelligent Computing Theories and Application ISBN: 9783030845315 ICIC (3) |
DOI: | 10.1007/978-3-030-84532-2_4 |
Popis: | The process of stemming is considered as one of the most essential steps in natural language processing and retrieving information. Nevertheless, in Arabic language, the task of stemming remains a major challenge due to the fact that Arabic language has a particular morphology, thereby making it different from other languages. Majority of existing algorithms are limited to a given number of words, create ambiguity between original letters and affixes, and often make use of dictionary patterns or words. We therefore, for the first time, present a design and implementation of Arabic light stemmer based on Information Science Research Institute algorithm. The algorithm is evaluated empirically using a newly created Arabic dataset which was created using data from different Arabic websites with contents that have been written in modern Arabic language. The experimental results indicated that the proposed method outperforms when benchmarked with existing methods. |
Databáze: | OpenAIRE |
Externí odkaz: |