An adaptive approach for lip-reading using image and depth data

Autor:	Achraf Ben-Hamadou, Ahmed Rekik, Walid Mahdi
Rok vydání:	2015
Předmět:	Computer Networks and Communications Facial motion capture Computer science business.industry Feature vector 020207 software engineering 02 engineering and technology Hardware and Architecture Robustness (computer science) 0202 electrical engineering electronic engineering information engineering Media Technology 020201 artificial intelligence & image processing Computer vision Artificial intelligence business Software
Zdroj:	Multimedia Tools and Applications. 75:8609-8636
ISSN:	1573-7721 1380-7501
Popis:	Lip-reading (LR) systems play an important role for automatic speech recognition when acoustic information is corrupted or unavailable. This article proposes an adaptive LR system for speech segment recognition using image and depth data. In addition to 2D images, the proposed system handles depth data that are very informative about 3D lips' deformations when uttering and present a certain robustness against the variation of mouth skin color and texture. The proposed system is based on two main steps. In the first step, the mouth thumbnails are extracted based on a 3D face pose tracking. Then, appearance and motion descriptors are computed and combined in a final feature vector describing the uttered speech. The accuracy of 3D face tracking module is evaluated on the BIWI Kinect Head Pose database. The obtained results show that our method is competitive comparing to other state-of-the-art methods combining image and depth data (i.e., 2.26 mm and 3.86ź for mean position error and mean orientation error). Additionally, the overall LR system is evaluated using three public LR datasets (i.e., MIRACL-VC1, OuluVS, and CUAVE). The obtained results demonstrate that data are complementary to 2D image data and reduce the speaker dependency problem in LR. The OuluVS and CUAVE datasets containing 2D images only are used to evaluate the proposed system when depth data are unavailable and to compare it to recent state-of-the art LR systems. The obtained results show very competitive recognition rates (up to 96 % for MIRACL-VC1, 93.2 % for OuluVS, and 90 % for CUAVE).
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::97a2802908575e906bb8f374f88e6e58 https://doi.org/10.1007/s11042-015-2774-3 Zobrazit plný text záznamu Full text from SpringerLink