Noise Robust Voice Activity Detection Based on Multi-Layer Feed-Forward Neural Network
Autor: | Özkan Arslan, Erkan Zeki Engin |
---|---|
Přispěvatelé: | Ege Üniversitesi |
Rok vydání: | 2019 |
Předmět: |
Speech communication
Computer science Errors Zero crossing rate Speech recognition Multilayer neural networks Mühendislik Voice activity detection time and spectral features multi-layer feed-forward neural network Mühendislik Elektrik ve Elektronik Noise environments Constant false alarm rate Engineering Image resolution Speech Spectral flatness Entropy (information theory) Electrical and Electronic Engineering Multi layer Signal to noise ratio Voice activity detection Feedforward neural networks Artificial neural network Centroid Time and spectral features Multilayer feedforward neural networks time and spectral features Extensive simulations Computer Science::Sound Feature extraction Feedforward neural network multi layer feed forward neural network Short-time energy Multi-layer feed-forward neural network Overall accuracies Spectral feature |
Zdroj: | Volume: 19, Issue: 2 91-100 Electrica |
ISSN: | 2619-9831 |
DOI: | 10.26650/electrica.2019.18042 |
Popis: | EgeUn### This paper proposes a voice activity detection (VAD) method based on time and spectral domain features using multi layer feed forward neural network (MLF-NN) for various noisy conditions. In the proposed method, time features that were short time energy and zero crossing rate and spectral features that were entropy, centroid, roll-off, and flux of speech signals were extracted. Clean speech signals were used in training MLF-NN and the network was tested for noisy speech at various noisy conditions. The proposed VAD method was evaluated for six kinds of noises which are white, car, babble, airport, street, and train at four different signal-to-noise ratio (SNR) levels. The proposed method was tested on core TIMIT database and its performance was compared with SOHN, G.729B and Long Term Spectral Flatness (LSFM) VAD methods in point of correct speech rate, false alarm rate, and overall accuracy rate. Extensive simulation results show that the proposed method gives the most successful average correct speech rate, false alarm rate, and overall accuracy rate in most low and high SNR level conditions for different noise environments. |
Databáze: | OpenAIRE |
Externí odkaz: |