Automatic speech recognition system for Tunisian dialect
Autor: | Abir Masmoudi, Lamia Hadrich Belguith, Fethi Bougares, Mariem Ellouze, Yannick Estève |
---|---|
Přispěvatelé: | Multimedia, InfoRmation systems and Advanced Computing Laboratory (MIRACL), Faculté des Sciences Economiques et de Gestion de Sfax (FSEG Sfax), Université de Sfax - University of Sfax-Université de Sfax - University of Sfax, Laboratoire d'Informatique de l'Université du Mans (LIUM), Le Mans Université (UM) |
Rok vydání: | 2017 |
Předmět: |
Linguistics and Language
Arabic Computer science Speech recognition Tunisian dialect Word error rate 02 engineering and technology Library and Information Sciences computer.software_genre 01 natural sciences [INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL] Language and Linguistics Education 0103 physical sciences 0202 electrical engineering electronic engineering information engineering 010301 acoustics business.industry Automatic speech recognition Under-resourced language Rule-based Grapheme-to-phoneme conversion language.human_language Linguistics Focus (linguistics) language Modern Standard Arabic 020201 artificial intelligence & image processing Artificial intelligence Computational linguistics business computer Natural language processing |
Zdroj: | BASE-Bielefeld Academic Search Engine Language Resources and Evaluation Language Resources and Evaluation, Springer Verlag, 2018, 52 (1), pp.249-267. ⟨10.1007/s10579-017-9402-y⟩ |
ISSN: | 1574-0218 1574-020X |
Popis: | International audience; Although Modern Standard Arabic is taught in schools and used in written communication and TV/radio broadcasts, all informal communication is typically carried out in dialectal Arabic. In this work, we focus on the design of speech tools and resources required for the development of an Automatic Speech Recognition system for the Tunisian dialect. The development of such a system faces the challenges of the lack of annotated resources and tools, apart from the lack of standardization at all linguistic levels (phonological, morphological, syntactic and lexical) together with the mispronunciation dictionary needed for ASR development. In this paper, we present a historical overview of the Tunisian dialect and its linguistic characteristics. We also describe and evaluate our rule-based phonetic tool. Next, we go deeper into the details of Tunisian dialect corpus creation. This corpus is finally approved and used to build the first ASR system for Tunisian dialect with a Word Error Rate of 22.6%. |
Databáze: | OpenAIRE |
Externí odkaz: |