Výsledky vyhledávání - "Yacoob, Yaser"

Report

Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders

Autor: Shi, Min, Liu, Fuxiao, Wang, Shihao, Liao, Shijia, Radhakrishnan, Subhashree, Huang, De-An, Yin, Hongxu, Sapra, Karan, Yacoob, Yaser, Shi, Humphrey, Catanzaro, Bryan, Tao, Andrew, Kautz, Jan, Yu, Zhiding, Liu, Guilin

The ability to accurately interpret complex visual information is a crucial topic of multimodal large language models (MLLMs). Recent work indicates that enhanced visual perception significantly reduces hallucinations and improves performance on reso

Externí odkaz: http://arxiv.org/abs/2408.15998

Zobrazit plný text záznamu

Report

AVFF: Audio-Visual Feature Fusion for Video Deepfake Detection

Autor: Oorloff, Trevine, Koppisetti, Surya, Bonettini, Nicolò, Solanki, Divyaraj, Colman, Ben, Yacoob, Yaser, Shahriyari, Ali, Bharaj, Gaurav

With the rapid growth in deepfake video content, we require improved and generalizable methods to detect them. Most existing detection methods either use uni-modal cues or rely on supervised training to capture the dissonance between the audio and vi

Externí odkaz: http://arxiv.org/abs/2406.02951

Zobrazit plný text záznamu

Report

MMC: Advancing Multimodal Chart Understanding with Large-scale Instruction Tuning

Autor: Liu, Fuxiao, Wang, Xiaoyang, Yao, Wenlin, Chen, Jianshu, Song, Kaiqiang, Cho, Sangwoo, Yacoob, Yaser, Yu, Dong

With the rapid development of large language models (LLMs) and their integration into large multimodal models (LMMs), there has been impressive progress in zero-shot completion of user-oriented vision-language tasks. However, a gap remains in the dom

Externí odkaz: http://arxiv.org/abs/2311.10774

Zobrazit plný text záznamu

Report

HallusionBench: An Advanced Diagnostic Suite for Entangled Language Hallucination and Visual Illusion in Large Vision-Language Models

Autor: Guan, Tianrui, Liu, Fuxiao, Wu, Xiyang, Xian, Ruiqi, Li, Zongxia, Liu, Xiaoyu, Wang, Xijun, Chen, Lichang, Huang, Furong, Yacoob, Yaser, Manocha, Dinesh, Zhou, Tianyi

We introduce HallusionBench, a comprehensive benchmark designed for the evaluation of image-context reasoning. This benchmark presents significant challenges to advanced large visual-language models (LVLMs), such as GPT-4V(Vision), Gemini Pro Vision,

Externí odkaz: http://arxiv.org/abs/2310.14566

Zobrazit plný text záznamu

Report

Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning

Autor: Liu, Fuxiao, Lin, Kevin, Li, Linjie, Wang, Jianfeng, Yacoob, Yaser, Wang, Lijuan

Despite the promising progress in multi-modal tasks, current large multi-modal models (LMMs) are prone to hallucinating inconsistent descriptions with respect to the associated image and human instructions. This paper addresses this issue by introduc

Externí odkaz: http://arxiv.org/abs/2306.14565

Zobrazit plný text záznamu

Report

COVID-VTS: Fact Extraction and Verification on Short Video Platforms

Autor: Liu, Fuxiao, Yacoob, Yaser, Shrivastava, Abhinav

We introduce a new benchmark, COVID-VTS, for fact-checking multi-modal information involving short-duration videos with COVID19- focused information from both the real world and machine generation. We propose, TwtrDetective, an effective model incorp

Externí odkaz: http://arxiv.org/abs/2302.07919

Zobrazit plný text záznamu

Report

One-Shot Face Video Re-enactment using Hybrid Latent Spaces of StyleGAN2

Autor: Oorloff, Trevine, Yacoob, Yaser

While recent research has progressively overcome the low-resolution constraint of one-shot face video re-enactment with the help of StyleGAN's high-fidelity portrait generation, these approaches rely on at least one of the following: explicit 2D/3D p

Externí odkaz: http://arxiv.org/abs/2302.07848

Zobrazit plný text záznamu

Report

Expressive Talking Head Video Encoding in StyleGAN2 Latent-Space

Autor: Oorloff, Trevine, Yacoob, Yaser

While the recent advances in research on video reenactment have yielded promising results, the approaches fall short in capturing the fine, detailed, and expressive facial features (e.g., lip-pressing, mouth puckering, mouth gaping, and wrinkles) whi

Externí odkaz: http://arxiv.org/abs/2203.14512

Zobrazit plný text záznamu

Report

Label Denoising Adversarial Network (LDAN) for Inverse Lighting of Face Images

Autor: Zhou, Hao, Sun, Jin, Yacoob, Yaser, Jacobs, David W.

Lighting estimation from face images is an important task and has applications in many areas such as image editing, intrinsic image decomposition, and image forgery detection. We propose to train a deep Convolutional Neural Network (CNN) to regress l

Externí odkaz: http://arxiv.org/abs/1709.01993

Zobrazit plný text záznamu

Report

Modeling Colors of Single Attribute Variations with Application to Food Appearance

Autor: Yacoob, Yaser

This paper considers the intra-image color-space of an object or a scene when these are subject to a dominant single-source of variation. The source of variation can be intrinsic or extrinsic (i.e., imaging conditions) to the object. We observe that

Externí odkaz: http://arxiv.org/abs/1512.06075

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání