Noise Robust Voice Activity Detection Based on Multi-Layer Feed-Forward Neural Network

Autor: Özkan Arslan, Erkan Zeki Engin
Přispěvatelé: Ege Üniversitesi
Rok vydání: 2019
Předmět:
Speech communication
Computer science
Errors
Zero crossing rate
Speech recognition
Multilayer neural networks
Mühendislik
Voice activity detection
time and spectral features
multi-layer feed-forward neural network

Mühendislik
Elektrik ve Elektronik

Noise environments
Constant false alarm rate
Engineering
Image resolution
Speech
Spectral flatness
Entropy (information theory)
Electrical and Electronic Engineering
Multi layer
Signal to noise ratio
Voice activity detection
Feedforward neural networks
Artificial neural network
Centroid
Time and spectral features
Multilayer feedforward neural networks
time and spectral features
Extensive simulations
Computer Science::Sound
Feature extraction
Feedforward neural network
multi layer feed forward neural network
Short-time energy
Multi-layer feed-forward neural network
Overall accuracies
Spectral feature
Zdroj: Volume: 19, Issue: 2 91-100
Electrica
ISSN: 2619-9831
DOI: 10.26650/electrica.2019.18042
Popis: EgeUn###
This paper proposes a voice activity detection (VAD) method based on time and spectral domain features using multi layer feed forward neural network (MLF-NN) for various noisy conditions. In the proposed method, time features that were short time energy and zero crossing rate and spectral features that were entropy, centroid, roll-off, and flux of speech signals were extracted. Clean speech signals were used in training MLF-NN and the network was tested for noisy speech at various noisy conditions. The proposed VAD method was evaluated for six kinds of noises which are white, car, babble, airport, street, and train at four different signal-to-noise ratio (SNR) levels. The proposed method was tested on core TIMIT database and its performance was compared with SOHN, G.729B and Long Term Spectral Flatness (LSFM) VAD methods in point of correct speech rate, false alarm rate, and overall accuracy rate. Extensive simulation results show that the proposed method gives the most successful average correct speech rate, false alarm rate, and overall accuracy rate in most low and high SNR level conditions for different noise environments.
Databáze: OpenAIRE