Výsledky vyhledávání

Report

ManipGPT: Is Affordance Segmentation by Large Vision Models Enough for Articulated Object Manipulation?

Autor: Kim, Taewhan, Bae, Hojin, Li, Zeming, Li, Xiaoqi, Ponomarenko, Iaroslav, Wu, Ruihai, Dong, Hao

Visual actionable affordance has emerged as a transformative approach in robotics, focusing on perceiving interaction areas prior to manipulation. Traditional methods rely on pixel sampling to identify successful interaction samples or processing poi

Externí odkaz: http://arxiv.org/abs/2412.10050

Zobrazit plný text záznamu

Report

Multi-modal Relation Distillation for Unified 3D Representation Learning

Autor: Wang, Huiqun, Bao, Yiping, Pan, Panwang, Li, Zeming, Liu, Xiao, Yang, Ruijie, Huang, Di

Recent advancements in multi-modal pre-training for 3D point clouds have demonstrated promising results by aligning heterogeneous features across 3D shapes and their corresponding 2D images and language descriptions. However, current straightforward

Externí odkaz: http://arxiv.org/abs/2407.14007

Zobrazit plný text záznamu

Report

4K4DGen: Panoramic 4D Generation at 4K Resolution

Autor: Li, Renjie, Pan, Panwang, Yang, Bangbang, Xu, Dejia, Zhou, Shijie, Zhang, Xuanyang, Li, Zeming, Kadambi, Achuta, Wang, Zhangyang, Tu, Zhengzhong, Fan, Zhiwen

The blooming of virtual reality and augmented reality (VR/AR) technologies has driven an increasing demand for the creation of high-quality, immersive, and dynamic environments. However, existing generative techniques either focus solely on dynamic o

Externí odkaz: http://arxiv.org/abs/2406.13527

Zobrazit plný text záznamu

Report

HumanSplat: Generalizable Single-Image Human Gaussian Splatting with Structure Priors

Autor: Pan, Panwang, Su, Zhuo, Lin, Chenguo, Fan, Zhen, Zhang, Yongjie, Li, Zeming, Shen, Tingting, Mu, Yadong, Liu, Yebin

Despite recent advancements in high-fidelity human reconstruction techniques, the requirements for densely captured images or time-consuming per-instance optimization significantly hinder their applications in broader scenarios. To tackle these issue

Externí odkaz: http://arxiv.org/abs/2406.12459

Zobrazit plný text záznamu

Report

An Experimental Study on Exploring Strong Lightweight Vision Transformers via Masked Image Modeling Pre-Training

Autor: Gao, Jin, Lin, Shubo, Wang, Shaoru, Kou, Yutong, Li, Zeming, Li, Liang, Zhang, Congxuan, Zhang, Xiaoqin, Wang, Yizheng, Hu, Weiming

Masked image modeling (MIM) pre-training for large-scale vision transformers (ViTs) has enabled promising downstream performance on top of the learned self-supervised ViT features. In this paper, we question if the \textit{extremely simple} lightweig

Externí odkaz: http://arxiv.org/abs/2404.12210

Zobrazit plný text záznamu

Report

HMD-Poser: On-Device Real-time Human Motion Tracking from Scalable Sparse Observations

Autor: Dai, Peng, Zhang, Yang, Liu, Tao, Fan, Zhen, Du, Tianyuan, Su, Zhuo, Zheng, Xiaozheng, Li, Zeming

It is especially challenging to achieve real-time human motion tracking on a standalone VR Head-Mounted Display (HMD) such as Meta Quest and PICO. In this paper, we propose HMD-Poser, the first unified approach to recover full-body motions using scal

Externí odkaz: http://arxiv.org/abs/2403.03561

Zobrazit plný text záznamu

Report

GMM: Delving into Gradient Aware and Model Perceive Depth Mining for Monocular 3D Detection

Autor: Mao, Weixin, Yang, Jinrong, Ge, Zheng, Song, Lin, Zhou, Hongyu, Mao, Tiezheng, Li, Zeming, Yoshie, Osamu

Depth perception is a crucial component of monoc-ular 3D detection tasks that typically involve ill-posed problems. In light of the success of sample mining techniques in 2D object detection, we propose a simple yet effective mining strategy for impr

Externí odkaz: http://arxiv.org/abs/2306.17450

Zobrazit plný text záznamu

Report

Dynamic Grained Encoder for Vision Transformers

Autor: Song, Lin, Zhang, Songyang, Liu, Songtao, Li, Zeming, He, Xuming, Sun, Hongbin, Sun, Jian, Zheng, Nanning

Transformers, the de-facto standard for language modeling, have been recently applied for vision tasks. This paper introduces sparse queries for vision transformers to exploit the intrinsic spatial redundancy of natural images and save computational

Externí odkaz: http://arxiv.org/abs/2301.03831

Zobrazit plný text záznamu

Report

Generalizing Multiple Object Tracking to Unseen Domains by Introducing Natural Language Representation

Autor: Yu, En, Liu, Songtao, Li, Zhuoling, Yang, Jinrong, li, Zeming, Han, Shoudong, Tao, Wenbing

Although existing multi-object tracking (MOT) algorithms have obtained competitive performance on various benchmarks, almost all of them train and validate models on the same domain. The domain generalization problem of MOT is hardly studied. To brid

Externí odkaz: http://arxiv.org/abs/2212.01568

Zobrazit plný text záznamu

Report

MatrixVT: Efficient Multi-Camera to BEV Transformation for 3D Perception

Autor: Zhou, Hongyu, Ge, Zheng, Li, Zeming, Zhang, Xiangyu

This paper proposes an efficient multi-camera to Bird's-Eye-View (BEV) view transformation method for 3D perception, dubbed MatrixVT. Existing view transformers either suffer from poor transformation efficiency or rely on device-specific operators, h

Externí odkaz: http://arxiv.org/abs/2211.10593

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání