A Method of Speech Coding for Speech Recognition Using a Convolutional Neural Network

Autor:	Mariusz Kubanek, Janusz Bobulski, Joanna Kulawik
Jazyk:	angličtina
Rok vydání:	2019
Předmět:	speech recognition convolutional neural network deep learning Mathematics QA1-939
Zdroj:	Symmetry, Vol 11, Iss 9, p 1185 (2019)
Druh dokumentu:	article
ISSN:	2073-8994 11091185
DOI:	10.3390/sym11091185
Popis:	This work presents a new approach to speech recognition, based on the specific coding of time and frequency characteristics of speech. The research proposed the use of convolutional neural networks because, as we know, they show high resistance to cross-spectral distortions and differences in the length of the vocal tract. Until now, two layers of time convolution and frequency convolution were used. A novel idea is to weave three separate convolution layers: traditional time convolution and the introduction of two different frequency convolutions (mel-frequency cepstral coefficients (MFCC) convolution and spectrum convolution). This application takes into account more details contained in the tested signal. Our idea assumes creating patterns for sounds in the form of RGB (Red, Green, Blue) images. The work carried out research for isolated words and continuous speech, for neural network structure. A method for dividing continuous speech into syllables has been proposed. This method can be used for symmetrical stereo sound.
Databáze:	Directory of Open Access Journals
Externí odkaz:	https://doaj.org/article/55bd7fce74d1460983e3f27d2ba3f627 Zobrazit plný text záznamu View record in DOAJ Plný text ve formátu PDF Plný text ve formátu HTML
Nepřihlášeným uživatelům se plný text nezobrazuje	K zobrazení výsledku je třeba se přihlásit.