On measuring the intelligibility of synthetic speech in noise #x2014; Do we need a realistic noise environment?

Autor:	Martti Vainio, Olli Santala, Antti Suni, Paavo Alku, Tuomo Raitio, Marko Takanen
Jazyk:	angličtina
Rok vydání:	2012
Předmět:	Computer science synthetic voices Speech recognition realistic noise environment Loudspeakers synthetic speech Speech synthesis Intelligibility (communication) synthetic speech intelligibility computer.software_genre speech synthesis Speech multichannel reproduction Hidden Markov models Lombard speech speech multichannel reproduction speech intelligibility intelligibility Signal processing Signal to noise ratio Noise measurement Noise (signal processing) headphone setup diotic speech Speech processing headphones Educational institutions noise multichannel reproduction speech in noise Mel-frequency cepstrum Loudspeaker computer
Zdroj:	Raitio, T, Takanen, M, Santala, O, Suni, A, Vainio, M & Alku, P 2012, On measuring the intelligibility of synthetic speech in noise #x2014; Do we need a realistic noise environment? in Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on . Institute of Electrical and Electronics Engineers (IEEE), pp. 4025-4028 . https://doi.org/10.1109/ICASSP.2012.6288801
DOI:	10.1109/ICASSP.2012.6288801
Popis:	Assessing the intelligibility of synthetic speech is important in creating synthetic voices to be used in real life applications, especially for the ones involving interfering noise. This raises the question how to measure the intelligibility of synthetic speech to correctly simulate such conditions. Conventionally, this has been done using a simple listening test setup where diotic speech and noise are played to both ears with headphones. This is indeed very different from the real noise environment where speech and noise are spatially distributed. This paper addresses the question whether a realistic noise environment should be used to test the intelligibility of synthetic speech. Three different test conditions, one with multichannel reproduction of noise and speech, and two headphone setups are evaluated. Tests are performed with natural and synthetic speech, including speech especially intended for noisy conditions. The results indicate a general trend in all setups but also some interesting differences.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::c51a8de64d0f9ec228e73dbde9a73f19 https://hdl.handle.net/20.500.11820/c0d84e74-4ca3-4b62-871e-f02345392777 Zobrazit plný text záznamu