Výsledky vyhledávání

Report

Sonic: Shifting Focus to Global Audio Perception in Portrait Animation

Autor: Ji, Xiaozhong, Hu, Xiaobin, Xu, Zhihong, Zhu, Junwei, Lin, Chuming, He, Qingdong, Zhang, Jiangning, Luo, Donghao, Chen, Yi, Lin, Qin, Lu, Qinglin, Wang, Chengjie

The study of talking face generation mainly explores the intricacies of synchronizing facial movements and crafting visually appealing, temporally-coherent animations. However, due to the limited exploration of global audio perception, current approa

Externí odkaz: http://arxiv.org/abs/2411.16331

Zobrazit plný text záznamu

Report

Unveil Inversion and Invariance in Flow Transformer for Versatile Image Editing

Autor: Xu, Pengcheng, Jiang, Boyuan, Hu, Xiaobin, Luo, Donghao, He, Qingdong, Zhang, Jiangning, Wang, Chengjie, Wu, Yunsheng, Ling, Charles, Wang, Boyu

Leveraging the large generative prior of the flow transformer for tuning-free image editing requires authentic inversion to project the image into the model's domain and a flexible invariance control mechanism to preserve non-target contents. However

Externí odkaz: http://arxiv.org/abs/2411.15843

Zobrazit plný text záznamu

Report

FitDiT: Advancing the Authentic Garment Details for High-fidelity Virtual Try-on

Autor: Jiang, Boyuan, Hu, Xiaobin, Luo, Donghao, He, Qingdong, Xu, Chengming, Peng, Jinlong, Zhang, Jiangning, Wang, Chengjie, Wu, Yunsheng, Fu, Yanwei

Although image-based virtual try-on has made considerable progress, emerging approaches still encounter challenges in producing high-fidelity and robust fitting images across diverse scenarios. These methods often struggle with issues such as texture

Externí odkaz: http://arxiv.org/abs/2411.10499

Zobrazit plný text záznamu

Report

VTON-HandFit: Virtual Try-on for Arbitrary Hand Pose Guided by Hand Priors Embedding

Autor: Liang, Yujie, Hu, Xiaobin, Jiang, Boyuan, Luo, Donghao, WU, Kai, Han, Wenhui, Jin, Taisong, Wang, Chengjie

Although diffusion-based image virtual try-on has made considerable progress, emerging approaches still struggle to effectively address the issue of hand occlusion (i.e., clothing regions occluded by the hand part), leading to a notable degradation o

Externí odkaz: http://arxiv.org/abs/2408.12340

Zobrazit plný text záznamu

Report

Oracle Bone Inscriptions Multi-modal Dataset

Oracle bone inscriptions(OBI) is the earliest developed writing system in China, bearing invaluable written exemplifications of early Shang history and paleography. However, the task of deciphering OBI, in the current climate of the scholarship, can

Externí odkaz: http://arxiv.org/abs/2407.03900

Zobrazit plný text záznamu

Report

RealTalk: Real-time and Realistic Audio-driven Face Generation with 3D Facial Prior-guided Identity Alignment Network

Autor: Ji, Xiaozhong, Lin, Chuming, Ding, Zhonggan, Tai, Ying, Zhu, Junwei, Hu, Xiaobin, Luo, Donghao, Ge, Yanhao, Wang, Chengjie

Person-generic audio-driven face generation is a challenging task in computer vision. Previous methods have achieved remarkable progress in audio-visual synchronization, but there is still a significant gap between current results and practical appli

Externí odkaz: http://arxiv.org/abs/2406.18284

Zobrazit plný text záznamu

Report

DF40: Toward Next-Generation Deepfake Detection

Autor: Yan, Zhiyuan, Yao, Taiping, Chen, Shen, Zhao, Yandan, Fu, Xinghe, Zhu, Junwei, Luo, Donghao, Wang, Chengjie, Ding, Shouhong, Wu, Yunsheng, Yuan, Li

We propose a new comprehensive benchmark to revolutionize the current deepfake detection field to the next generation. Predominantly, existing works identify top-notch detection algorithms and models by adhering to the common practice: training detec

Externí odkaz: http://arxiv.org/abs/2406.13495

Zobrazit plný text záznamu

Report

CustAny: Customizing Anything from A Single Example

Autor: Kong, Lingjie, Wu, Kai, Hu, Xiaobin, Han, Wenhui, Peng, Jinlong, Xu, Chengming, Luo, Donghao, Li, Mengtian, Zhang, Jiangning, Wang, Chengjie, Fu, Yanwei

Recent advances in diffusion-based text-to-image models have simplified creating high-fidelity images, but preserving the identity (ID) of specific elements, like a personal dog, is still challenging. Object customization, using reference images and

Externí odkaz: http://arxiv.org/abs/2406.11643

Zobrazit plný text záznamu

Report

NoiseBoost: Alleviating Hallucination with Noise Perturbation for Multimodal Large Language Models

Autor: Wu, Kai, Jiang, Boyuan, Jiang, Zhengkai, He, Qingdong, Luo, Donghao, Wang, Shengzhi, Liu, Qingwen, Wang, Chengjie

Multimodal large language models (MLLMs) contribute a powerful mechanism to understanding visual information building on large language models. However, MLLMs are notorious for suffering from hallucinations, especially when generating lengthy, detail

Externí odkaz: http://arxiv.org/abs/2405.20081

Zobrazit plný text záznamu

Report

ArtWeaver: Advanced Dynamic Style Integration via Diffusion Model

Autor: Xu, Chengming, Hu, Kai, Wang, Qilin, Luo, Donghao, Zhang, Jiangning, Hu, Xiaobin, Fu, Yanwei, Wang, Chengjie

Stylized Text-to-Image Generation (STIG) aims to generate images from text prompts and style reference images. In this paper, we present ArtWeaver, a novel framework that leverages pretrained Stable Diffusion (SD) to address challenges such as misint

Externí odkaz: http://arxiv.org/abs/2405.15287

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání