End-to-End Acoustic Modeling Using Convolutional Neural Networks

Autor:	Vishal Passricha, Rajesh Kumar Aggarwal
Rok vydání:	2019
Předmět:	Artificial neural network Discriminative model Computer science Speech recognition Feature extraction TIMIT Mel-frequency cepstrum Hidden Markov model Mixture model Convolutional neural network
DOI:	10.1016/b978-0-12-818130-0.00002-7
Popis:	State-of-the-art automatic speech recognition (ASR) systems map speech into its corresponding text. Conventional ASR systems model the speech signal into phones in two steps; feature extraction and classifier training. Traditional ASR systems have been replaced by deep neural network (DNN)-based systems. Today, end-to-end ASRs are gaining in popularity due to simplified model-building processes and the ability to directly map speech into text without predefined alignments. These models are based on data-driven learning methods and competition with complicated ASR models based on DNN and linguistic resources. There are three major types of end-to-end architectures for ASR: Attention-based methods, connectionist temporal classification, and convolutional neural network (CNN)-based direct raw speech model. This chapter discusses end-to-end acoustic modeling using CNN in detail. CNN establishes the relationship between the raw speech signal and phones in a data-driven manner. Relevant features and classifiers are jointly learned from raw speech. The first convolutional layer automatically learns feature representation. That intermediate representation is more discriminative and further processed by rest convolutional layers. This system performs better than traditional cepstral feature–based systems but uses a high number of parameters. The performance of the system is evaluated for TIMIT and claimed better performance than MFCC feature-based GMM/HMM (Gaussian mixture model/hidden Markov model) model.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::b1516f9c74a00eae3d5baaa1fba1f199 https://doi.org/10.1016/b978-0-12-818130-0.00002-7 Zobrazit plný text záznamu