Zobrazeno 1 - 10
of 272
pro vyhledávání: '"Liu Xihui"'
Autor:
Wang, Yuqing, Xiong, Tianwei, Zhou, Daquan, Lin, Zhijie, Zhao, Yang, Kang, Bingyi, Feng, Jiashi, Liu, Xihui
It is desirable but challenging to generate content-rich long videos in the scale of minutes. Autoregressive large language models (LLMs) have achieved great success in generating coherent and long sequences of tokens in the domain of natural languag
Externí odkaz:
http://arxiv.org/abs/2410.02757
Accelerating Auto-regressive Text-to-Image Generation with Training-free Speculative Jacobi Decoding
The current large auto-regressive models can generate high-quality, high-resolution images, but these models require hundreds or even thousands of steps of next-token prediction during inference, resulting in substantial time consumption. In existing
Externí odkaz:
http://arxiv.org/abs/2410.01699
Autor:
Wang, Yunnan, Li, Ziqiang, Zhang, Zequn, Zhang, Wenyao, Xie, Baao, Liu, Xihui, Zeng, Wenjun, Jin, Xin
There has been exciting progress in generating images from natural language or layout conditions. However, these methods struggle to faithfully reproduce complex scenes due to the insufficient modeling of multiple objects and their relationships. To
Externí odkaz:
http://arxiv.org/abs/2410.00447
Recent advancements in Large Multimodal Models (LMMs) have greatly enhanced their proficiency in 2D visual understanding tasks, enabling them to effectively process and understand images and videos. However, the development of LMMs with 3D-awareness
Externí odkaz:
http://arxiv.org/abs/2409.18125
Leveraging pretrained 2D diffusion models and score distillation sampling (SDS), recent methods have shown promising results for text-to-3D avatar generation. However, generating high-quality 3D avatars capable of expressive animation remains challen
Externí odkaz:
http://arxiv.org/abs/2409.17145
Text-to-video (T2V) generation models have advanced significantly, yet their ability to compose different objects, attributes, actions, and motions into a video remains unexplored. Previous text-to-video benchmarks also neglect this important ability
Externí odkaz:
http://arxiv.org/abs/2407.14505
Object-oriented embodied navigation aims to locate specific objects, defined by category or depicted in images. Existing methods often struggle to generalize to open vocabulary goals without extensive training data. While recent advances in Vision-La
Externí odkaz:
http://arxiv.org/abs/2407.09016
In this paper, we introduce PredBench, a benchmark tailored for the holistic evaluation of spatio-temporal prediction networks. Despite significant progress in this field, there remains a lack of a standardized framework for a detailed and comparativ
Externí odkaz:
http://arxiv.org/abs/2407.08418
Despite the success achieved by existing image generation and editing methods, current models still struggle with complex problems including intricate text prompts, and the absence of verification and self-correction mechanisms makes the generated im
Externí odkaz:
http://arxiv.org/abs/2407.05600
Autor:
Qi, Zhangyang, Yang, Yunhan, Zhang, Mengchen, Xing, Long, Wu, Xiaoyang, Wu, Tong, Lin, Dahua, Liu, Xihui, Wang, Jiaqi, Zhao, Hengshuang
Recent advances in 3D AIGC have shown promise in directly creating 3D objects from text and images, offering significant cost savings in animation and product design. However, detailed edit and customization of 3D assets remains a long-standing chall
Externí odkaz:
http://arxiv.org/abs/2407.06191