Development of Indonesian audiovisual speech synthesis system for assistance children with delayed speech

Autor: Joko Sarwono, Dhany Arifianto, Sangsaka Wira, Elok Anggrayni, Nyilo Purnami
Rok vydání: 2020
Předmět:
Zdroj: The Journal of the Acoustical Society of America. 148:2470-2470
ISSN: 0001-4966
DOI: 10.1121/1.5146835
Popis: Hearing impairment is one of the congenital deafness frequently found in children, which is followed by a delayed speech. Furthermore, a speech therapist currently available is limited. In this research, we outlined the development of the Indonesian audio-visual speech synthesis system for learning of the deaf children with delayed speech. First, we developed two kinds of Indonesian corpus, such as speech corpus and audio-visual corpus. The speech corpus contains speech recordings from professional speech therapists. The total duration of all recorded Indonesian speech database is more than 18 hours of audio. The audio-visual corpus contains visual phoneme (viseme) which is the visualization of Indonesian phoneme for lips. Segmentation and labeling were conducted to create transcriptions. We did some variation in the number of sentences and the type of sentences used in the training part of speech synthesis. Audio-visual synthesis used viseme concatenation method. The objective evaluation result using the Mel-cepstrum distortion method was 2.8. The subjective evaluation result using Mean Opinion Score was 3.71. The evaluation results showed that the new design of Indonesian audio-visual speech synthesis for learning to produce any single meaningful word was capable to use as the alternative for hospitals for the therapy of the delayed speech patients.
Databáze: OpenAIRE