Exploring the Performance of Farasa and CAMeL Taggers for Arabic Dialect Tweets
Autor: | Areej Alshutayri, Aseel Alfaidi, Hajer Alwadei, Shahd Alahda |
---|---|
Rok vydání: | 2023 |
Předmět: | |
Zdroj: | The International Arab Journal of Information Technology. 20 |
ISSN: | 2309-4524 1683-3198 |
DOI: | 10.34028/iajit/20/3/7 |
Popis: | In Natural Language Processing (NLP), Part Of Speech (POS) tagging is an important step; it is a fundamental requirement for many applications, such as information extraction, machine translation, and grammar checking. Successful POS taggers have been developed for many languages, including Arabic. Currently, the spread of social media has increased the diversity of dialects as people use them in their online communications. Therefore, it has become more difficult for researchers to classify some words that are understood by humans but not computers. In addition, most Arabic POS research focuses on Modern Standard Arabic (MSA), while Dialect Arabic (DA) receives less attention. This paper aims to evaluate the performance of two Arabic taggers when used on dialect Arabic tweets and determine which tagger is the appropriate one, which will accordingly help to improve the existent taggers for dialect Arabic tweets. We used the Farasa and CAMeL taggers, which are commonly used to analyze Arabic texts and are considered the best taggers for Arabic. The results indicate that CAMeL tagger performed better than Farasa tagger, with accuracies of 92% and 83% respectively. In other words, a hybrid POS tagger trained with MSA and DA returns better results than the one trained on MSA. |
Databáze: | OpenAIRE |
Externí odkaz: |