Popis: |
Stemming has shown to be effective in many natural language processing (NLP) applications such as in document classification, machine translation, and information retrieval (IR). This paper compares the performance of nine stemmers for Arabic language on microblog IR. These stemmers include: Information Science Research Institute (ISRI), Tashaphyne, Khoja, AL-stem, Light10, Motaz, Assem, Farasa, and ARLStem. Each stemmer was studied independently using the EveTAR dataset on a specific information retrieval task to obtain relevant query tweets. The performance of the nine stemmers was evaluated using BM25, precision at 30, and Mean Average Precision (MAP). The results show that root-based stemmers (i.e. ISRI and Khoja) outperformed others. |