Prozodijski model za sintezu turskog teksta u govor na temelju pravila

Autor: Ibrahim Baran Uslu, Hakki Gokhan Ilk, Asim Egemen Yilmaz
Jazyk: angličtina
Rok vydání: 2013
Předmět:
Zdroj: Tehnički vjesnik
Volume 20
Issue 2
ISSN: 1848-6339
1330-3651
Popis: Ovaj članak predstavlja naš novi prozodijski model u sustavu za sintezu turskog teksta u govor (TTS). Nakon razvijanja TTS sustava vođenog parametrijskim osobinama koje se sastoje od promjena trajanja, visine i jačine glasa, pokušavamo postaviti neka prozodijska pravila kako bi se povećala prirodnost našeg sintetizatora. Budući da u turskom jeziku glagoli koji se sprežu mogu biti samostalne rečenice uz sufikse koji im se dodaju, sastavljamo perceptualni prozodijski model definiranjem pravila o obrascima naglasaka kod sprezanja glagola. Sistematski su se proučavali potvrdni, negativni i upitni (i potvrdni i negativni) oblici mnogih glagola. Nisu se proučavali samo glagoli već, na isti način, i neke fraze kako bi se postigla ispravna prozodija. Prema rezultatima testova slušanja, definirana pravila zasnovana na promjenama trajanja, visine i jačine glasa, dovode do perceptualno bolje govorne sinteze, naime u prosjeku do 1,78/5,0 poboljšanja u CMSO testu (Comparative Mean Opinion Score). To poboljšanje predstavlja uspjeh našeg novog prozodijskog modela.
This paper presents our novel prosody model in a Turkish text-to-speech synthesis (TTS) system. After developing a TTS system driven by parametric features consisting of duration, pitch and energy modifications, we try to figure out some prosody rules in order to increase the naturalness of our synthesizer. Since the inflected verbs in Turkish can be stand-alone sentences with the suffixes they take, we build a perceptual prosody model by defining rules on the stress patterns of verb inflections. Affirmative, negative and interrogative (both positive and negative) forms of many verbs were examined in a systematic way. Not only verbs, but in the same way, some phrases were examined for obtaining a proper prosody. According to the results of listening tests, the defined rules based on duration, pitch and energy modification weights, result in perceptually better speech synthesis, namely about 1,78/5,0 improvement in average in the CMOS (Comparative Mean Opinion Score) test. This improvement shows the success of our novel prosody model.
Databáze: OpenAIRE