A Time-Domain Convolutional Recurrent Network for Packet Loss Concealment

Autor:	Yun Wang, Gil Keren, Didi Zhang, Kaustubh Kalgaonkar, Ju Lin, Christian Fuegen
Rok vydání:	2021
Předmět:	Speech enhancement Voice over IP business.industry Packet loss Computer science Mean opinion score Speech recognition Word error rate Intelligibility (communication) business PESQ Packet loss concealment
Zdroj:	ICASSP
DOI:	10.1109/icassp39728.2021.9413595
Popis:	Packet loss may affect a wide range of applications that use voice over IP (VoIP), e.g. video conferencing. In this paper, we investigate a time-domain convolutional recurrent network (CRN) for online packet loss concealment. The CRN comprises a convolutional encoder-decoder structure and long short-term memory (LSTM) layers, which have been shown to be suitable for real-time speech enhancement applications. Moreover, we propose lookahead and masked training to further improve the performance of the CRN framework. Experimental results show that the proposed system outperforms a baseline system using only LSTM layers in terms of two objective metrics – perceptual evaluation of speech quality (PESQ) and short-term objective intelligibility (STOI); it also reduces the word error rate (WER) more than the baseline when used as a frontend for speech recognition. The advantage of the proposed system is also verified in a subjective evaluation by the mean opinion score (MOS).
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::73c7b38a38c5cc735b1c857de9f250cd https://doi.org/10.1109/icassp39728.2021.9413595 Zobrazit plný text záznamu