Zobrazeno 1 - 10
of 64
pro vyhledávání: '"Yacoob, Yaser"'
Autor:
Shi, Min, Liu, Fuxiao, Wang, Shihao, Liao, Shijia, Radhakrishnan, Subhashree, Huang, De-An, Yin, Hongxu, Sapra, Karan, Yacoob, Yaser, Shi, Humphrey, Catanzaro, Bryan, Tao, Andrew, Kautz, Jan, Yu, Zhiding, Liu, Guilin
The ability to accurately interpret complex visual information is a crucial topic of multimodal large language models (MLLMs). Recent work indicates that enhanced visual perception significantly reduces hallucinations and improves performance on reso
Externí odkaz:
http://arxiv.org/abs/2408.15998
Autor:
Oorloff, Trevine, Koppisetti, Surya, Bonettini, Nicolò, Solanki, Divyaraj, Colman, Ben, Yacoob, Yaser, Shahriyari, Ali, Bharaj, Gaurav
With the rapid growth in deepfake video content, we require improved and generalizable methods to detect them. Most existing detection methods either use uni-modal cues or rely on supervised training to capture the dissonance between the audio and vi
Externí odkaz:
http://arxiv.org/abs/2406.02951
Autor:
Liu, Fuxiao, Wang, Xiaoyang, Yao, Wenlin, Chen, Jianshu, Song, Kaiqiang, Cho, Sangwoo, Yacoob, Yaser, Yu, Dong
With the rapid development of large language models (LLMs) and their integration into large multimodal models (LMMs), there has been impressive progress in zero-shot completion of user-oriented vision-language tasks. However, a gap remains in the dom
Externí odkaz:
http://arxiv.org/abs/2311.10774
Autor:
Guan, Tianrui, Liu, Fuxiao, Wu, Xiyang, Xian, Ruiqi, Li, Zongxia, Liu, Xiaoyu, Wang, Xijun, Chen, Lichang, Huang, Furong, Yacoob, Yaser, Manocha, Dinesh, Zhou, Tianyi
We introduce HallusionBench, a comprehensive benchmark designed for the evaluation of image-context reasoning. This benchmark presents significant challenges to advanced large visual-language models (LVLMs), such as GPT-4V(Vision), Gemini Pro Vision,
Externí odkaz:
http://arxiv.org/abs/2310.14566
Despite the promising progress in multi-modal tasks, current large multi-modal models (LMMs) are prone to hallucinating inconsistent descriptions with respect to the associated image and human instructions. This paper addresses this issue by introduc
Externí odkaz:
http://arxiv.org/abs/2306.14565
We introduce a new benchmark, COVID-VTS, for fact-checking multi-modal information involving short-duration videos with COVID19- focused information from both the real world and machine generation. We propose, TwtrDetective, an effective model incorp
Externí odkaz:
http://arxiv.org/abs/2302.07919
Autor:
Oorloff, Trevine, Yacoob, Yaser
While recent research has progressively overcome the low-resolution constraint of one-shot face video re-enactment with the help of StyleGAN's high-fidelity portrait generation, these approaches rely on at least one of the following: explicit 2D/3D p
Externí odkaz:
http://arxiv.org/abs/2302.07848
Autor:
Oorloff, Trevine, Yacoob, Yaser
While the recent advances in research on video reenactment have yielded promising results, the approaches fall short in capturing the fine, detailed, and expressive facial features (e.g., lip-pressing, mouth puckering, mouth gaping, and wrinkles) whi
Externí odkaz:
http://arxiv.org/abs/2203.14512
Lighting estimation from face images is an important task and has applications in many areas such as image editing, intrinsic image decomposition, and image forgery detection. We propose to train a deep Convolutional Neural Network (CNN) to regress l
Externí odkaz:
http://arxiv.org/abs/1709.01993
Autor:
Yacoob, Yaser
This paper considers the intra-image color-space of an object or a scene when these are subject to a dominant single-source of variation. The source of variation can be intrinsic or extrinsic (i.e., imaging conditions) to the object. We observe that
Externí odkaz:
http://arxiv.org/abs/1512.06075