Improvement of Noise-Robust Single-Channel Voice Activity Detection with Spatial Pre-processing

Autor: Væhrens, Max, Fuglsig, Andreas Jonas, Jacobsen, Anders Post, Rasmussen, Nicolai Almskou, Nissen, Victor Mølbach, Hejslet, Joachim Roland, Tan, Zheng-Hua
Rok vydání: 2021
Předmět:
Druh dokumentu: Working Paper
Popis: Voice activity detection (VAD) remains a challenge in noisy environments. With access to multiple microphones, prior studies have attempted to improve the noise robustness of VAD by creating multi-channel VAD (MVAD) methods. However, MVAD is relatively new compared to single-channel VAD (SVAD), which has been thoroughly developed in the past. It might therefore be advantageous to improve SVAD methods with pre-processing to obtain superior VAD, which is under-explored. This paper improves SVAD through two pre-processing methods, a beamformer and a spatial target speaker detector. The spatial detector sets signal frames to zero when no potential speaker is present within a target direction. The detector may be implemented as a filter, meaning the input signal for the SVAD is filtered according to the detector's output; or it may be implemented as a spatial VAD to be combined with the SVAD output. The evaluation is made on a noisy reverberant speech database, with clean speech from the Aurora 2 database and with white and babble noise. The results show that SVAD algorithms are significantly improved by the presented pre-processing methods, especially the spatial detector, across all signal-to-noise ratios. The SVAD algorithms with pre-processing significantly outperform a baseline MVAD in challenging noise conditions.
Comment: Submitted to Interspeech 2021
Databáze: arXiv