Speaker Selective Beamformer with Keyword Mask Estimation
Autor: | Dung Tran, Motoi Omachi, Yusuke Kida, Yuya Fujita, Toru Taniguchi |
---|---|
Rok vydání: | 2018 |
Předmět: |
FOS: Computer and information sciences
Beamforming Computer Science - Machine Learning Sound (cs.SD) Computer science Speech recognition Novelty Estimator 020206 networking & telecommunications 02 engineering and technology Filter (signal processing) Speech processing Signal Computer Science - Sound Machine Learning (cs.LG) 030507 speech-language pathology & audiology 03 medical and health sciences Audio and Speech Processing (eess.AS) FOS: Electrical engineering electronic engineering information engineering 0202 electrical engineering electronic engineering information engineering 0305 other medical science Focus (optics) Electrical Engineering and Systems Science - Audio and Speech Processing |
Zdroj: | SLT |
DOI: | 10.1109/slt.2018.8639651 |
Popis: | This paper addresses the problem of automatic speech recognition (ASR) of a target speaker in background speech. The novelty of our approach is that we focus on a wakeup keyword, which is usually used for activating ASR systems like smart speakers. The proposed method firstly utilizes a DNN-based mask estimator to separate the mixture signal into the keyword signal uttered by the target speaker and the remaining background speech. Then the separated signals are used for calculating a beamforming filter to enhance the subsequent utterances from the target speaker. Experimental evaluations show that the trained DNN-based mask can selectively separate the keyword and background speech from the mixture signal. The effectiveness of the proposed method is also verified with Japanese ASR experiments, and we confirm that the character error rates are significantly improved by the proposed method for both simulated and real recorded test sets. Comment: Accepted by SLT2018 |
Databáze: | OpenAIRE |
Externí odkaz: |