The Automatic Recognition and Translation of Tunisian Dialect Named Entities into Modern Standard Arabic

Autor: Roua Torjmen, Kais Haddar
Rok vydání: 2021
Předmět:
Zdroj: Communications in Computer and Information Science ISBN: 9783030706289
NooJ
DOI: 10.1007/978-3-030-70629-6_18
Popis: Developing an automatic named-entity recognition system accompanied by a translation system has become an important task in Natural Language Processing applications. In this context, we are interested in building a named-entity recognition system for Tunisian dialect by providing their translation into modern standard Arabic. In fact, Tunisian dialect is a variant of Arabic, as much as it differs from modern standard Arabic. Still, it is difficult to understand for non-Tunisian Arabic speakers. To develop our system, we studied many Tunisian dialect corpora to identify and look into various structures for different named entity types. The proposed method is based on a bilingual dictionary extracted from the study corpus and an elaborated set of local grammars. In addition, local grammars were transformed into finite-state transducers using recent technologies of the NooJ linguistic platform. To test and evaluate the designed system, we applied it to a Tunisian dialect test corpus containing around 20,000 words. The obtained results are ambitious.
Databáze: OpenAIRE