Speaker-Independent Lipreading With Limited Data

Autor:	Xingxuan Zhang, Chen-Zhao Yang, Yun Zhu, Shilin Wang
Rok vydání:	2020
Předmět:	Scheme (programming language) Normalization (statistics) Artificial neural network Computer science Speech recognition Feature extraction Normalization (image processing) 02 engineering and technology 010501 environmental sciences 01 natural sciences 0202 electrical engineering electronic engineering information engineering 020201 artificial intelligence & image processing Hidden Markov model computer 0105 earth and related environmental sciences Block (data storage) computer.programming_language Transformer (machine learning model)
Zdroj:	ICIP
DOI:	10.1109/icip40778.2020.9190780
Popis:	Recent researches have demonstrated that with a huge annotated training dataset, some sophisticated automatic lipreading methods perform even better than a professional human lip reader. However, when the training set is limited, i.e. containing a few number of speakers, most existing lipreading approaches cannot provide accurate recognition results for unseen speakers due to the inter-speaker variability. To improve the lipreading performance in the speaker-independent scenario, a new deep neural network (DNN) is proposed in this paper. The proposed network is composed of two parts, i.e. the Transformer-based Visual Speech Recognition Network (TVSR-Net) and the Speaker Confusion Block (SC-Block). The TVSR-Net is designed to extract lip features and recognize the speech. The SC-Block aims to achieve speaker normalization by eliminating the influence of various talking styles/habits. A Multi-Task Learning (MTL) scheme is designed for network optimization. Experiment results on the GRID dataset have demonstrated the effectiveness of the proposed network on speaker-independent recognition with limited training data.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::2a1e3a43e6fca73042f8842182705594 https://doi.org/10.1109/icip40778.2020.9190780 Zobrazit plný text záznamu