Zobrazeno 1 - 10
of 67
pro vyhledávání: '"Jiang Huaizu"'
Since 2023, Vector Quantization (VQ)-based discrete generation methods have rapidly dominated human motion generation, primarily surpassing diffusion-based continuous generation methods in standard performance metrics. However, VQ-based methods have
Externí odkaz:
http://arxiv.org/abs/2411.16575
Real-time high-accuracy optical flow estimation is crucial for various real-world applications. While recent learning-based optical flow methods have achieved high accuracy, they often come with significant computational costs. In this paper, we prop
Externí odkaz:
http://arxiv.org/abs/2408.10161
Visual relationship understanding has been studied separately in human-object interaction(HOI) detection, scene graph generation(SGG), and referring relationships(RR) tasks. Given the complexity and interconnectedness of these tasks, it is crucial to
Externí odkaz:
http://arxiv.org/abs/2408.08305
We present Stable Video 4D (SV4D), a latent video diffusion model for multi-frame and multi-view consistent dynamic 3D content generation. Unlike previous methods that rely on separately trained generative models for video generation and novel view s
Externí odkaz:
http://arxiv.org/abs/2407.17470
We introduce a novel Stylized Motion Diffusion model, dubbed SMooDi, to generate stylized motion driven by content texts and style motion sequences. Unlike existing methods that either generate motion of various content or transfer style from one seq
Externí odkaz:
http://arxiv.org/abs/2407.12783
We introduce HouseCrafter, a novel approach that can lift a floorplan into a complete large 3D indoor scene (e.g., a house). Our key insight is to adapt a 2D diffusion model, which is trained on web-scale images, to generate consistent multi-view col
Externí odkaz:
http://arxiv.org/abs/2406.20077
Obstacle detection and tracking represent a critical component in robot autonomous navigation. In this paper, we propose ODTFormer, a Transformer-based model to address both obstacle detection and tracking problems. For the detection task, our approa
Externí odkaz:
http://arxiv.org/abs/2403.14626
Visual navigation has received significant attention recently. Most of the prior works focus on predicting navigation actions based on semantic features extracted from visual encoders. However, these approaches often rely on large datasets and exhibi
Externí odkaz:
http://arxiv.org/abs/2403.12039
Real-time high-accuracy optical flow estimation is a crucial component in various applications, including localization and mapping in robotics, object tracking, and activity recognition in computer vision. While recent learning-based optical flow met
Externí odkaz:
http://arxiv.org/abs/2403.10425
We address the problem of generating realistic 3D human-object interactions (HOIs) driven by textual prompts. To this end, we take a modular design and decompose the complex task into simpler sub-tasks. We first develop a dual-branch diffusion model
Externí odkaz:
http://arxiv.org/abs/2312.06553