An adaptive approach for lip-reading using image and depth data

Autor: Achraf Ben-Hamadou, Ahmed Rekik, Walid Mahdi
Rok vydání: 2015
Předmět:
Zdroj: Multimedia Tools and Applications. 75:8609-8636
ISSN: 1573-7721
1380-7501
Popis: Lip-reading (LR) systems play an important role for automatic speech recognition when acoustic information is corrupted or unavailable. This article proposes an adaptive LR system for speech segment recognition using image and depth data. In addition to 2D images, the proposed system handles depth data that are very informative about 3D lips' deformations when uttering and present a certain robustness against the variation of mouth skin color and texture. The proposed system is based on two main steps. In the first step, the mouth thumbnails are extracted based on a 3D face pose tracking. Then, appearance and motion descriptors are computed and combined in a final feature vector describing the uttered speech. The accuracy of 3D face tracking module is evaluated on the BIWI Kinect Head Pose database. The obtained results show that our method is competitive comparing to other state-of-the-art methods combining image and depth data (i.e., 2.26 mm and 3.86ź for mean position error and mean orientation error). Additionally, the overall LR system is evaluated using three public LR datasets (i.e., MIRACL-VC1, OuluVS, and CUAVE). The obtained results demonstrate that data are complementary to 2D image data and reduce the speaker dependency problem in LR. The OuluVS and CUAVE datasets containing 2D images only are used to evaluate the proposed system when depth data are unavailable and to compare it to recent state-of-the art LR systems. The obtained results show very competitive recognition rates (up to 96 % for MIRACL-VC1, 93.2 % for OuluVS, and 90 % for CUAVE).
Databáze: OpenAIRE