Zobrazeno 1 - 10
of 124
pro vyhledávání: '"Wang, Yu-Xiong"'
Autor:
Chen, Jun-Kun, Bulò, Samuel Rota, Müller, Norman, Porzi, Lorenzo, Kontschieder, Peter, Wang, Yu-Xiong
This paper proposes ConsistDreamer - a novel framework that lifts 2D diffusion models with 3D awareness and 3D consistency, thus enabling high-fidelity instruction-guided scene editing. To overcome the fundamental limitation of missing 3D consistency
Externí odkaz:
http://arxiv.org/abs/2406.09404
This paper proposes Instruct 4D-to-4D that achieves 4D awareness and spatial-temporal consistency for 2D diffusion models to generate high-quality instruction-guided dynamic scene editing results. Traditional applications of 2D diffusion models in dy
Externí odkaz:
http://arxiv.org/abs/2406.09402
With recent video object segmentation (VOS) benchmarks evolving to challenging scenarios, we revisit a simple but overlooked strategy: restricting the size of memory banks. This diverges from the prevalent practice of expanding memory banks to accomm
Externí odkaz:
http://arxiv.org/abs/2406.08476
Being able to carry out complicated vision language reasoning tasks in 3D space represents a significant milestone in developing household robots and human-centered embodied AI. In this work, we demonstrate that a critical and distinct challenge in 3
Externí odkaz:
http://arxiv.org/abs/2406.07544
Autor:
Yang, Qianlan, Wang, Yu-Xiong
Training autonomous agents with sparse rewards is a long-standing problem in online reinforcement learning (RL), due to low data efficiency. Prior work overcomes this challenge by extracting useful knowledge from offline data, often accomplished thro
Externí odkaz:
http://arxiv.org/abs/2406.04323
Autor:
Cao, Shengcao, Gu, Jiuxiang, Kuen, Jason, Tan, Hao, Zhang, Ruiyi, Zhao, Handong, Nenkova, Ani, Gui, Liang-Yan, Sun, Tong, Wang, Yu-Xiong
Open-world entity segmentation, as an emerging computer vision task, aims at segmenting entities in images without being restricted by pre-defined classes, offering impressive generalization capabilities on unseen images and concepts. Despite its pro
Externí odkaz:
http://arxiv.org/abs/2404.12386
Text-conditioned human motion generation has experienced significant advancements with diffusion models trained on extensive motion capture data and corresponding textual annotations. However, extending such success to 3D dynamic human-object interac
Externí odkaz:
http://arxiv.org/abs/2403.19652
The limited scale of current 3D shape datasets hinders the advancements in 3D shape understanding, and motivates multi-modal learning approaches which transfer learned knowledge from data-abundant 2D image and language modalities to 3D shapes. Howeve
Externí odkaz:
http://arxiv.org/abs/2402.18490
The human visual perception system demonstrates exceptional capabilities in learning without explicit supervision and understanding the part-to-whole composition of objects. Drawing inspiration from these two abilities, we propose Hierarchical Adapti
Externí odkaz:
http://arxiv.org/abs/2402.03311
Autor:
Shlapentokh-Rothman, Michal, Blume, Ansel, Xiao, Yao, Wu, Yuqun, T V, Sethuraman, Tao, Heyi, Lee, Jae Yong, Torres, Wilfredo, Wang, Yu-Xiong, Hoiem, Derek
We investigate whether region-based representations are effective for recognition. Regions were once a mainstay in recognition approaches, but pixel and patch-based features are now used almost exclusively. We show that recent class-agnostic segmente
Externí odkaz:
http://arxiv.org/abs/2402.02352