Robust front-end for audio, visual and audio–visual speech classification
Autor: | Juan Carlos Gómez, Lucas D. Terissi, Gonzalo D. Sad |
---|---|
Rok vydání: | 2018 |
Předmět: |
Linguistics and Language
WAVELET DECOMPOSITION AUDIO–VISUAL SPEECH RECOGNITION Computer science Speech recognition Audio-visual speech recognition INGENIERÍAS Y TECNOLOGÍAS 02 engineering and technology Speech classification Language and Linguistics RANDOM FORESTS Human-Computer Interaction Front and back ends 030507 speech-language pathology & audiology 03 medical and health sciences Wavelet decomposition Audio visual 0202 electrical engineering electronic engineering information engineering 020201 artificial intelligence & image processing Ingeniería Eléctrica y Electrónica Computer Vision and Pattern Recognition 0305 other medical science Ingeniería Eléctrica Ingeniería Electrónica e Ingeniería de la Información Software |
Zdroj: | International Journal of Speech Technology. 21:293-307 |
ISSN: | 1572-8110 1381-2416 |
DOI: | 10.1007/s10772-018-9504-y |
Popis: | This paper proposes a robust front-end for speech classification which can be employed with acoustic, visual or audio–visual information, indistinctly. Wavelet multiresolution analysis is employed to represent temporal input data associated with speech information. These wavelet-based features are then used as inputs to a Random Forest classifier to perform the speech classification. The performance of the proposed speech classification scheme is evaluated in different scenarios, namely, considering only acoustic information, only visual information (lip-reading), and fused audio–visual information. These evaluations are carried out over three different audio–visual databases, two of them public ones and the remaining one compiled by the authors of this paper. Experimental results show that a good performance is achieved with the proposed system over the three databases and for the different kinds of input information being considered. In addition, the proposed method performs better than other reported methods in the literature over the same two public databases. All the experiments were implemented using the same configuration parameters. These results also indicate that the proposed method performs satisfactorily, neither requiring the tuning of the wavelet decomposition parameters nor of the Random Forests classifier parameters, for each particular database and input modalities. Fil: Terissi, Lucas Daniel. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Rosario. Centro Internacional Franco Argentino de Ciencias de la Información y de Sistemas. Universidad Nacional de Rosario. Centro Internacional Franco Argentino de Ciencias de la Información y de Sistemas; Argentina Fil: Sad, Gonzalo Daniel. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Rosario. Centro Internacional Franco Argentino de Ciencias de la Información y de Sistemas. Universidad Nacional de Rosario. Centro Internacional Franco Argentino de Ciencias de la Información y de Sistemas; Argentina Fil: Gómez, Juan Carlos. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Rosario. Centro Internacional Franco Argentino de Ciencias de la Información y de Sistemas. Universidad Nacional de Rosario. Centro Internacional Franco Argentino de Ciencias de la Información y de Sistemas; Argentina |
Databáze: | OpenAIRE |
Externí odkaz: |