The Automatic Recognition and Translation of Tunisian Dialect Named Entities into Modern Standard Arabic

Autor:	Roua Torjmen, Kais Haddar
Rok vydání:	2021
Předmět:	business.industry Computer science Bilingual dictionary Context (language use) computer.software_genre Translation (geometry) Dialect Test language.human_language Set (abstract data type) Rule-based machine translation Named-entity recognition Modern Standard Arabic language Artificial intelligence business computer Natural language processing
Zdroj:	Communications in Computer and Information Science ISBN: 9783030706289 NooJ
DOI:	10.1007/978-3-030-70629-6_18
Popis:	Developing an automatic named-entity recognition system accompanied by a translation system has become an important task in Natural Language Processing applications. In this context, we are interested in building a named-entity recognition system for Tunisian dialect by providing their translation into modern standard Arabic. In fact, Tunisian dialect is a variant of Arabic, as much as it differs from modern standard Arabic. Still, it is difficult to understand for non-Tunisian Arabic speakers. To develop our system, we studied many Tunisian dialect corpora to identify and look into various structures for different named entity types. The proposed method is based on a bilingual dictionary extracted from the study corpus and an elaborated set of local grammars. In addition, local grammars were transformed into finite-state transducers using recent technologies of the NooJ linguistic platform. To test and evaluate the designed system, we applied it to a Tunisian dialect test corpus containing around 20,000 words. The obtained results are ambitious.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::20138f7c8111fb819212718c4ab127fb https://doi.org/10.1007/978-3-030-70629-6_18 Zobrazit plný text záznamu