A Novel LSTM-Based Speech Preprocessor for Speaker Diarization in Realistic Mismatch Conditions

Autor: Yu Tsao, Neville Ryant, Tian Gao, Lei Sun, Chin-Hui Lee, Yu-Ding Lu, Jun Du
Rok vydání: 2018
Předmět:
Zdroj: ICASSP
Popis: In this study, we investigate on the effects of deep learning based speech enhancement as a preprocessor to speaker diarization in quite challenging realistic environments involving the background noises, reverberations and overlapping speech. To improve the generalization capability, the advanced long short-term memory (LSTM) architecture with the novel design of hidden layers via densely connected progressive learning and output layer via multiple-target learning is proposed for preprocessing. We build the deep model using synthesized training data pairs generated from WSJO reading-style speech and more than 100 noise types. Surprisingly, this proposed preprocessor demonstrates a strong generalization capability to speaker di-arization with the realistic noisy speech in highly mismatched conditions, in terms of the speaking style, interferences, and the interaction between them. Tested on three challenging tasks, namely AMI, ADOS, and SeedLings, the state-of-the-art diarization system with the novel LSTM-based speech preprocessor can yield consistent and significant reductions of diarization error rate (DER) over the systems using unprocessed noisy speech and traditional enhancement methods.
Databáze: OpenAIRE