Zobrazeno 1 - 10
of 11
pro vyhledávání: '"Yeo, Jeong Hun"'
Autor:
Yeo, Jeong Hun, Kim, Chae Won, Kim, Hyunjun, Rha, Hyeongseop, Han, Seunghee, Cheng, Wen-Huang, Ro, Yong Man
Lip reading aims to predict spoken language by analyzing lip movements. Despite advancements in lip reading technologies, performance degrades when models are applied to unseen speakers due to their sensitivity to variations in visual information suc
Externí odkaz:
http://arxiv.org/abs/2409.00986
Autor:
Park, Se Jin, Kim, Chae Won, Rha, Hyeongseop, Kim, Minsu, Hong, Joanna, Yeo, Jeong Hun, Ro, Yong Man
In this paper, we introduce a novel Face-to-Face spoken dialogue model. It processes audio-visual speech from user input and generates audio-visual speech as the response, marking the initial step towards creating an avatar chatbot system without rel
Externí odkaz:
http://arxiv.org/abs/2406.07867
In visual speech processing, context modeling capability is one of the most important requirements due to the ambiguous nature of lip movements. For example, homophenes, words that share identical lip movements but produce different sounds, can be di
Externí odkaz:
http://arxiv.org/abs/2402.15151
This paper explores sentence-level multilingual Visual Speech Recognition (VSR) that can recognize different languages with a single trained model. As the massive multilingual modeling of visual data requires huge computational costs, we propose a no
Externí odkaz:
http://arxiv.org/abs/2401.09802
This paper proposes a powerful Visual Speech Recognition (VSR) method for multiple languages, especially for low-resource languages that have a limited number of labeled data. Different from previous methods that tried to improve the VSR performance
Externí odkaz:
http://arxiv.org/abs/2309.08535
In this paper, we propose methods to build a powerful and efficient Image-to-Speech captioning (Im2Sp) model. To this end, we start with importing the rich knowledge related to image comprehension and language modeling from a large-scale pre-trained
Externí odkaz:
http://arxiv.org/abs/2309.08531
This paper proposes a novel lip reading framework, especially for low-resource languages, which has not been well addressed in the previous literature. Since low-resource languages do not have enough video-text paired data to train the model to have
Externí odkaz:
http://arxiv.org/abs/2308.09311
Visual Speech Recognition (VSR) is the task of predicting spoken words from silent lip movements. VSR is regarded as a challenging task because of the insufficient information on lip movements. In this paper, we propose an Audio Knowledge empowered V
Externí odkaz:
http://arxiv.org/abs/2308.07593
Visual Speech Recognition (VSR) is a task to predict a sentence or word from lip movements. Some works have been recently presented which use audio signals to supplement visual information. However, existing methods utilize only limited information s
Externí odkaz:
http://arxiv.org/abs/2305.04542
Recognizing speech from silent lip movement, which is called lip reading, is a challenging task due to 1) the inherent information insufficiency of lip movement to fully represent the speech, and 2) the existence of homophenes that have similar lip m
Externí odkaz:
http://arxiv.org/abs/2204.01725