A review of Serbian parametric speech synthesis based on deep neural networks
Autor: | Sinisa Suzic, Tijana Delic, Milan Sečujski |
---|---|
Rok vydání: | 2017 |
Předmět: |
Computer Networks and Communications
Computer science Speech recognition Speech synthesis 02 engineering and technology computer.software_genre 01 natural sciences lcsh:Telecommunication speech synthesis lcsh:TK5101-6720 0103 physical sciences 0202 electrical engineering electronic engineering information engineering Media Technology HMM Parametric statistics 010302 applied physics Radiation business.industry language.human_language Signal Processing language Deep neural networks 020201 artificial intelligence & image processing Artificial intelligence business Serbian computer Software Natural language processing DNN |
Zdroj: | Telfor Journal (2017) 9(1):32-37 Telfor Journal, Vol 9, Iss 1, Pp 32-37 (2017) |
ISSN: | 2334-9905 1821-3251 |
Popis: | In this paper the research related to the development of a deep neural network based speech synthesizer for the Serbian language, trained on recorded utterances of a single female voice talent, is described. Two separate networks are used for prediction of acoustic features and phonetic segment durations. Through a set of experiments the optimal values of the hyper-parameters of the neural networks are established, and then the influence of the amount of training data on the quality of synthesized speech is examined. The quality is evaluated through objective measures as well as appropriate listening tests. It has been confirmed that 4-layer deep neural networks with 512 units per hidden layer, trained on 3 hours of data, produce speech of very good quality. The results also suggest that a further increase in the amount of training data may contribute to further improvement in quality. |
Databáze: | OpenAIRE |
Externí odkaz: |