A comparative analysis of pooling strategies for convolutional neural network based Hindi ASR

Autor: Vishal Passricha, Rajesh Kumar Aggarwal
Rok vydání: 2019
Předmět:
Zdroj: Journal of Ambient Intelligence and Humanized Computing. 11:675-691
ISSN: 1868-5145
1868-5137
DOI: 10.1007/s12652-019-01325-y
Popis: State-of-the-art speech recognition is witnessing its golden era as convolutional neural network (CNN) becomes the leader in this domain. CNN based acoustic models have been shown significant improvement in speech recognition tasks. This improvement is achieved due to the special components of CNN, i.e., local filters, weight sharing, and pooling. However, lack of core understanding renders this powerful model as a black-box machine. Although, CNN is performing well in speech recognition still further investigation will help in achieving better recognition rate. Pooling is a very important component of CNN that reduces the dimensionality of the feature-map and offers compact feature representation. Various pooling methods like max pooling, average pooling, stochastic pooling, mixed pooling, $${\text{L}}_{\text{p}}$$ pooling, multi-scale orderless pooling, and spectral pooling have their own advantages and disadvantages. In this paper, we deeply explore the state-of-the-art pooling for speech recognition tasks. This paper also helps to investigate that which pooling method performs well in which condition. This work explores different pooling methods for different architectures on Hindi speech dataset. The experimental results show that max pooling performs well when tested for clean speech and stochastic pooling works well in the noisy environment.
Databáze: OpenAIRE