Popis: |
Voice Activity Detection (VAD) remains a challenging task given its dependence on adverse noise and reverberation conditions. The problem becomes even more difficult when the microphones used to detect speech reside far from the speaker. In this paper, an unsupervised VAD scheme is presented, based on the Empirical Mode Decomposition (EMD) analysis framework and a multiple input likelihood ratio test (LRT). The highly efficient method of EMD relies on local characteristics of time scale of the data to analyse and decompose non-stationary signals into a set of so called intrinsic mode functions (IMF). These functions are injected to the multiple input LRT scheme in order to decide upon speech presence or absence. To minimize mis-detections and enhance the performance of the hypothesis test, a computationally efficient forgetting scheme along with an adaptive threshold are also employed. Simulations, conducted in several artificial environments, illustrate that significant improvements can be expected, in terms of performance, from the proposed scheme when compared to similar VAD systems. (4 pages) |