pMCT: Patched Multi-Condition Training for Robust Speech Recognition

Autor:	Parada, Pablo Peso, Dobrowolska, Agnieszka, Saravanan, Karthikeyan, Ozay, Mete
Rok vydání:	2022
Předmět:	Electrical Engineering and Systems Science - Audio and Speech Processing Computer Science - Sound
Druh dokumentu:	Working Paper
Popis:	We propose a novel Patched Multi-Condition Training (pMCT) method for robust Automatic Speech Recognition (ASR). pMCT employs Multi-condition Audio Modification and Patching (MAMP) via mixing {\it patches} of the same utterance extracted from clean and distorted speech. Training using patch-modified signals improves robustness of models in noisy reverberant scenarios. Our proposed pMCT is evaluated on the LibriSpeech dataset showing improvement over using vanilla Multi-Condition Training (MCT). For analyses on robust ASR, we employed pMCT on the VOiCES dataset which is a noisy reverberant dataset created using utterances from LibriSpeech. In the analyses, pMCT achieves 23.1% relative WER reduction compared to the MCT. Comment: Accepted at Interspeech 2022
Databáze:	arXiv
Externí odkaz:	http://arxiv.org/abs/2207.04949 Zobrazit plný text záznamu View this record from Arxiv