Zobrazeno 1 - 10
of 22 800
pro vyhledávání: '"visual speech"'
Autor:
Lin, Zhaofeng, Harte, Naomi
Audio-Visual Speech Recognition (AVSR) combines auditory and visual speech cues to enhance the accuracy and robustness of speech recognition systems. Recent advancements in AVSR have improved performance in noisy environments compared to audio-only c
Externí odkaz:
http://arxiv.org/abs/2412.17129
Autor:
Goncalves, Lucas, Mathur, Prashant, Niu, Xing, Houston, Brady, Lavania, Chandrashekhar, Vishnubhotla, Srikanth, Sun, Lijia, Ferritto, Anthony
Audio-Visual Speech-to-Speech Translation typically prioritizes improving translation quality and naturalness. However, an equally critical aspect in audio-visual content is lip-synchrony-ensuring that the movements of the lips match the spoken conte
Externí odkaz:
http://arxiv.org/abs/2412.16530
Autor:
Qian, Xinyuan, Gao, Jiaran, Zhang, Yaodan, Zhang, Qiquan, Liu, Hexin, Garcia, Leibny Paola, Li, Haizhou
Speech enhancement plays an essential role in various applications, and the integration of visual information has been demonstrated to bring substantial advantages. However, the majority of current research concentrates on the examination of facial a
Externí odkaz:
http://arxiv.org/abs/2411.07751
Autor:
Cappellazzo, Umberto, Kim, Minsu, Chen, Honglie, Ma, Pingchuan, Petridis, Stavros, Falavigna, Daniele, Brutti, Alessio, Pantic, Maja
Multimodal large language models (MLLMs) have recently become a focal point of research due to their formidable multimodal understanding capabilities. For example, in the audio and speech domains, an LLM can be equipped with (automatic) speech recogn
Externí odkaz:
http://arxiv.org/abs/2409.12319
This paper delineates the visual speech recognition (VSR) system introduced by the NPU-ASLP (Team 237) in the second Chinese Continuous Visual Speech Recognition Challenge (CNVSRC 2024), engaging in all four tracks, including the fixed and open track
Externí odkaz:
http://arxiv.org/abs/2408.02369
Visual Speech Recognition (VSR) aims to recognize corresponding text by analyzing visual information from lip movements. Due to the high variability and weak information of lip movements, VSR tasks require effectively utilizing any information from a
Externí odkaz:
http://arxiv.org/abs/2410.16438
This paper proposes a new unsupervised audiovisual speech enhancement (AVSE) approach that combines a diffusion-based audio-visual speech generative model with a non-negative matrix factorization (NMF) noise model. First, the diffusion model is pre-t
Externí odkaz:
http://arxiv.org/abs/2410.05301
This paper addresses the prevalent issue of incorrect speech output in audio-visual speech enhancement (AVSE) systems, which is often caused by poor video quality and mismatched training and test data. We introduce a post-processing classifier (PPC)
Externí odkaz:
http://arxiv.org/abs/2409.14554
Autor:
Laux, Hendrik, Schmeink, Anke
This paper presents LiteVSR2, an enhanced version of our previously introduced efficient approach to Visual Speech Recognition (VSR). Building upon our knowledge distillation framework from a pre-trained Automatic Speech Recognition (ASR) model, we i
Externí odkaz:
http://arxiv.org/abs/2409.07210
Autor:
Jain, Arnav, Sanjotra, Jasmer Singh, Choudhary, Harshvardhan, Agrawal, Krish, Shah, Rupal, Jha, Rohan, Sajid, M., Hussain, Amir, Tanveer, M.
Publikováno v:
INTERSPEECH 2024
In this paper, we propose long short term memory speech enhancement network (LSTMSE-Net), an audio-visual speech enhancement (AVSE) method. This innovative method leverages the complementary nature of visual and audio information to boost the quality
Externí odkaz:
http://arxiv.org/abs/2409.02266