Speaker-Independent Lipreading With Limited Data

Autor: Xingxuan Zhang, Chen-Zhao Yang, Yun Zhu, Shilin Wang
Rok vydání: 2020
Předmět:
Zdroj: ICIP
DOI: 10.1109/icip40778.2020.9190780
Popis: Recent researches have demonstrated that with a huge annotated training dataset, some sophisticated automatic lipreading methods perform even better than a professional human lip reader. However, when the training set is limited, i.e. containing a few number of speakers, most existing lipreading approaches cannot provide accurate recognition results for unseen speakers due to the inter-speaker variability. To improve the lipreading performance in the speaker-independent scenario, a new deep neural network (DNN) is proposed in this paper. The proposed network is composed of two parts, i.e. the Transformer-based Visual Speech Recognition Network (TVSR-Net) and the Speaker Confusion Block (SC-Block). The TVSR-Net is designed to extract lip features and recognize the speech. The SC-Block aims to achieve speaker normalization by eliminating the influence of various talking styles/habits. A Multi-Task Learning (MTL) scheme is designed for network optimization. Experiment results on the GRID dataset have demonstrated the effectiveness of the proposed network on speaker-independent recognition with limited training data.
Databáze: OpenAIRE