The Automatic Recognition and Translation of Tunisian Dialect Named Entities into Modern Standard Arabic
Autor: | Roua Torjmen, Kais Haddar |
---|---|
Rok vydání: | 2021 |
Předmět: |
business.industry
Computer science Bilingual dictionary Context (language use) computer.software_genre Translation (geometry) Dialect Test language.human_language Set (abstract data type) Rule-based machine translation Named-entity recognition Modern Standard Arabic language Artificial intelligence business computer Natural language processing |
Zdroj: | Communications in Computer and Information Science ISBN: 9783030706289 NooJ |
DOI: | 10.1007/978-3-030-70629-6_18 |
Popis: | Developing an automatic named-entity recognition system accompanied by a translation system has become an important task in Natural Language Processing applications. In this context, we are interested in building a named-entity recognition system for Tunisian dialect by providing their translation into modern standard Arabic. In fact, Tunisian dialect is a variant of Arabic, as much as it differs from modern standard Arabic. Still, it is difficult to understand for non-Tunisian Arabic speakers. To develop our system, we studied many Tunisian dialect corpora to identify and look into various structures for different named entity types. The proposed method is based on a bilingual dictionary extracted from the study corpus and an elaborated set of local grammars. In addition, local grammars were transformed into finite-state transducers using recent technologies of the NooJ linguistic platform. To test and evaluate the designed system, we applied it to a Tunisian dialect test corpus containing around 20,000 words. The obtained results are ambitious. |
Databáze: | OpenAIRE |
Externí odkaz: |