Zobrazeno 1 - 10
of 2 493
pro vyhledávání: '"Shan,Ying"'
Autor:
Cao, Mingdeng, Mou, Chong, Yuan, Ziyang, Wang, Xintao, Zhang, Zhaoyang, Shan, Ying, Zheng, Yinqiang
Consistent human-centric image and video synthesis aims to generate images or videos with new poses while preserving appearance consistency with a given reference image, which is crucial for low-cost visual content creation. Recent advances based on
Externí odkaz:
http://arxiv.org/abs/2412.14531
Procedural Content Generation (PCG) is powerful in creating high-quality 3D contents, yet controlling it to produce desired shapes is difficult and often requires extensive parameter tuning. Inverse Procedural Content Generation aims to automatically
Externí odkaz:
http://arxiv.org/abs/2412.15200
Automatic black-and-white image sequence colorization while preserving character and object identity (ID) is a complex task with significant market demand, such as in cartoon or comic series colorization. Despite advancements in visual colorization u
Externí odkaz:
http://arxiv.org/abs/2412.11815
Image editing has advanced significantly with the development of diffusion models using both inversion-based and instruction-based methods. However, current inversion-based approaches struggle with big modifications (e.g., adding or removing objects)
Externí odkaz:
http://arxiv.org/abs/2412.10316
Texture synthesis is a fundamental problem in computer graphics that would benefit various applications. Existing methods are effective in handling 2D image textures. In contrast, many real-world textures contain meso-structure in the 3D geometry spa
Externí odkaz:
http://arxiv.org/abs/2412.10004
Existing sparse-view reconstruction models heavily rely on accurate known camera poses. However, deriving camera extrinsics and intrinsics from sparse-view images presents significant challenges. In this work, we present FreeSplatter, a highly scalab
Externí odkaz:
http://arxiv.org/abs/2412.09573
Research on large language models has advanced significantly across text, speech, images, and videos. However, multi-modal music understanding and generation remain underexplored due to the lack of well-annotated datasets. To address this, we introdu
Externí odkaz:
http://arxiv.org/abs/2412.06660
Videos are inherently temporal sequences by their very nature. In this work, we explore the potential of modeling videos in a chronological and scalable manner with autoregressive (AR) language models, inspired by their success in natural language pr
Externí odkaz:
http://arxiv.org/abs/2412.04446
The advent of Multimodal Large Language Models, leveraging the power of Large Language Models, has recently demonstrated superior multimodal understanding and reasoning abilities, heralding a new era for artificial general intelligence. However, achi
Externí odkaz:
http://arxiv.org/abs/2412.04447
Recent developments in Large Language Models pre-trained on extensive corpora have shown significant success in various natural language processing tasks with minimal fine-tuning. This success offers new promise for robotics, which has long been cons
Externí odkaz:
http://arxiv.org/abs/2412.04445