Výsledky vyhledávání

Report

Consistent Human Image and Video Generation with Spatially Conditioned Diffusion

Autor: Cao, Mingdeng, Mou, Chong, Yuan, Ziyang, Wang, Xintao, Zhang, Zhaoyang, Shan, Ying, Zheng, Yinqiang

Consistent human-centric image and video synthesis aims to generate images or videos with new poses while preserving appearance consistency with a given reference image, which is crucial for low-cost visual content creation. Recent advances based on

Externí odkaz: http://arxiv.org/abs/2412.14531

Zobrazit plný text záznamu

Report

DI-PCG: Diffusion-based Efficient Inverse Procedural Content Generation for High-quality 3D Asset Creation

Autor: Zhao, Wang, Cao, Yan-Pei, Xu, Jiale, Dong, Yuejiang, Shan, Ying

Procedural Content Generation (PCG) is powerful in creating high-quality 3D contents, yet controlling it to produce desired shapes is difficult and often requires extensive parameter tuning. Inverse Procedural Content Generation aims to automatically

Externí odkaz: http://arxiv.org/abs/2412.15200

Zobrazit plný text záznamu

Report

ColorFlow: Retrieval-Augmented Image Sequence Colorization

Autor: Zhuang, Junhao, Ju, Xuan, Zhang, Zhaoyang, Liu, Yong, Zhang, Shiyi, Yuan, Chun, Shan, Ying

Automatic black-and-white image sequence colorization while preserving character and object identity (ID) is a complex task with significant market demand, such as in cartoon or comic series colorization. Despite advancements in visual colorization u

Externí odkaz: http://arxiv.org/abs/2412.11815

Zobrazit plný text záznamu

Report

BrushEdit: All-In-One Image Inpainting and Editing

Autor: Li, Yaowei, Bian, Yuxuan, Ju, Xuan, Zhang, Zhaoyang, Shan, Ying, Zou, Yuexian, Xu, Qiang

Image editing has advanced significantly with the development of diffusion models using both inversion-based and instruction-based methods. However, current inversion-based approaches struggle with big modifications (e.g., adding or removing objects)

Externí odkaz: http://arxiv.org/abs/2412.10316

Zobrazit plný text záznamu

Report

NeRF-Texture: Synthesizing Neural Radiance Field Textures

Autor: Huang, Yi-Hua, Cao, Yan-Pei, Lai, Yu-Kun, Shan, Ying, Gao, Lin

Texture synthesis is a fundamental problem in computer graphics that would benefit various applications. Existing methods are effective in handling 2D image textures. In contrast, many real-world textures contain meso-structure in the 3D geometry spa

Externí odkaz: http://arxiv.org/abs/2412.10004

Zobrazit plný text záznamu

Report

FreeSplatter: Pose-free Gaussian Splatting for Sparse-view 3D Reconstruction

Autor: Xu, Jiale, Gao, Shenghua, Shan, Ying

Existing sparse-view reconstruction models heavily rely on accurate known camera poses. However, deriving camera extrinsics and intrinsics from sparse-view images presents significant challenges. In this work, we present FreeSplatter, a highly scalab

Externí odkaz: http://arxiv.org/abs/2412.09573

Zobrazit plný text záznamu

Report

MuMu-LLaMA: Multi-modal Music Understanding and Generation via Large Language Models

Autor: Liu, Shansong, Hussain, Atin Sakkeer, Wu, Qilong, Sun, Chenshuo, Shan, Ying

Research on large language models has advanced significantly across text, speech, images, and videos. However, multi-modal music understanding and generation remain underexplored due to the lack of well-annotated datasets. To address this, we introdu

Externí odkaz: http://arxiv.org/abs/2412.06660

Zobrazit plný text záznamu

Report

DiCoDe: Diffusion-Compressed Deep Tokens for Autoregressive Video Generation with Language Models

Autor: Li, Yizhuo, Ge, Yuying, Ge, Yixiao, Luo, Ping, Shan, Ying

Videos are inherently temporal sequences by their very nature. In this work, we explore the potential of modeling videos in a chronological and scalable manner with autoregressive (AR) language models, inspired by their success in natural language pr

Externí odkaz: http://arxiv.org/abs/2412.04446

Zobrazit plný text záznamu

Report

EgoPlan-Bench2: A Benchmark for Multimodal Large Language Model Planning in Real-World Scenarios

Autor: Qiu, Lu, Ge, Yuying, Chen, Yi, Ge, Yixiao, Shan, Ying, Liu, Xihui

The advent of Multimodal Large Language Models, leveraging the power of Large Language Models, has recently demonstrated superior multimodal understanding and reasoning abilities, heralding a new era for artificial general intelligence. However, achi

Externí odkaz: http://arxiv.org/abs/2412.04447

Zobrazit plný text záznamu

Report

Moto: Latent Motion Token as the Bridging Language for Robot Manipulation

Autor: Chen, Yi, Ge, Yuying, Li, Yizhuo, Ge, Yixiao, Ding, Mingyu, Shan, Ying, Liu, Xihui

Recent developments in Large Language Models pre-trained on extensive corpora have shown significant success in various natural language processing tasks with minimal fine-tuning. This success offers new promise for robotics, which has long been cons

Externí odkaz: http://arxiv.org/abs/2412.04445

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání