A comparative analysis of pooling strategies for convolutional neural network based Hindi ASR

Autor:	Vishal Passricha, Rajesh Kumar Aggarwal
Rok vydání:	2019
Předmět:	Hindi General Computer Science Computer science Speech recognition Pooling Computational intelligence 02 engineering and technology Convolutional neural network language.human_language 030507 speech-language pathology & audiology 03 medical and health sciences Component (UML) 0202 electrical engineering electronic engineering information engineering language Feature (machine learning) 020201 artificial intelligence & image processing 0305 other medical science Representation (mathematics)
Zdroj:	Journal of Ambient Intelligence and Humanized Computing. 11:675-691
ISSN:	1868-5145 1868-5137
DOI:	10.1007/s12652-019-01325-y
Popis:	State-of-the-art speech recognition is witnessing its golden era as convolutional neural network (CNN) becomes the leader in this domain. CNN based acoustic models have been shown significant improvement in speech recognition tasks. This improvement is achieved due to the special components of CNN, i.e., local filters, weight sharing, and pooling. However, lack of core understanding renders this powerful model as a black-box machine. Although, CNN is performing well in speech recognition still further investigation will help in achieving better recognition rate. Pooling is a very important component of CNN that reduces the dimensionality of the feature-map and offers compact feature representation. Various pooling methods like max pooling, average pooling, stochastic pooling, mixed pooling, $${\text{L}}_{\text{p}}$$ pooling, multi-scale orderless pooling, and spectral pooling have their own advantages and disadvantages. In this paper, we deeply explore the state-of-the-art pooling for speech recognition tasks. This paper also helps to investigate that which pooling method performs well in which condition. This work explores different pooling methods for different architectures on Hindi speech dataset. The experimental results show that max pooling performs well when tested for clean speech and stochastic pooling works well in the noisy environment.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::15483156f510f00ffbb2ea8e3deb75fe https://doi.org/10.1007/s12652-019-01325-y Zobrazit plný text záznamu Full text from SpringerLink