Zobrazeno 1 - 10
of 332
pro vyhledávání: '"Li Zeming"'
Autor:
Kim, Taewhan, Bae, Hojin, Li, Zeming, Li, Xiaoqi, Ponomarenko, Iaroslav, Wu, Ruihai, Dong, Hao
Visual actionable affordance has emerged as a transformative approach in robotics, focusing on perceiving interaction areas prior to manipulation. Traditional methods rely on pixel sampling to identify successful interaction samples or processing poi
Externí odkaz:
http://arxiv.org/abs/2412.10050
Recent advancements in multi-modal pre-training for 3D point clouds have demonstrated promising results by aligning heterogeneous features across 3D shapes and their corresponding 2D images and language descriptions. However, current straightforward
Externí odkaz:
http://arxiv.org/abs/2407.14007
Autor:
Li, Renjie, Pan, Panwang, Yang, Bangbang, Xu, Dejia, Zhou, Shijie, Zhang, Xuanyang, Li, Zeming, Kadambi, Achuta, Wang, Zhangyang, Tu, Zhengzhong, Fan, Zhiwen
The blooming of virtual reality and augmented reality (VR/AR) technologies has driven an increasing demand for the creation of high-quality, immersive, and dynamic environments. However, existing generative techniques either focus solely on dynamic o
Externí odkaz:
http://arxiv.org/abs/2406.13527
Autor:
Pan, Panwang, Su, Zhuo, Lin, Chenguo, Fan, Zhen, Zhang, Yongjie, Li, Zeming, Shen, Tingting, Mu, Yadong, Liu, Yebin
Despite recent advancements in high-fidelity human reconstruction techniques, the requirements for densely captured images or time-consuming per-instance optimization significantly hinder their applications in broader scenarios. To tackle these issue
Externí odkaz:
http://arxiv.org/abs/2406.12459
Autor:
Gao, Jin, Lin, Shubo, Wang, Shaoru, Kou, Yutong, Li, Zeming, Li, Liang, Zhang, Congxuan, Zhang, Xiaoqin, Wang, Yizheng, Hu, Weiming
Masked image modeling (MIM) pre-training for large-scale vision transformers (ViTs) has enabled promising downstream performance on top of the learned self-supervised ViT features. In this paper, we question if the \textit{extremely simple} lightweig
Externí odkaz:
http://arxiv.org/abs/2404.12210
Autor:
Dai, Peng, Zhang, Yang, Liu, Tao, Fan, Zhen, Du, Tianyuan, Su, Zhuo, Zheng, Xiaozheng, Li, Zeming
It is especially challenging to achieve real-time human motion tracking on a standalone VR Head-Mounted Display (HMD) such as Meta Quest and PICO. In this paper, we propose HMD-Poser, the first unified approach to recover full-body motions using scal
Externí odkaz:
http://arxiv.org/abs/2403.03561
Autor:
Mao, Weixin, Yang, Jinrong, Ge, Zheng, Song, Lin, Zhou, Hongyu, Mao, Tiezheng, Li, Zeming, Yoshie, Osamu
Depth perception is a crucial component of monoc-ular 3D detection tasks that typically involve ill-posed problems. In light of the success of sample mining techniques in 2D object detection, we propose a simple yet effective mining strategy for impr
Externí odkaz:
http://arxiv.org/abs/2306.17450
Autor:
Song, Lin, Zhang, Songyang, Liu, Songtao, Li, Zeming, He, Xuming, Sun, Hongbin, Sun, Jian, Zheng, Nanning
Transformers, the de-facto standard for language modeling, have been recently applied for vision tasks. This paper introduces sparse queries for vision transformers to exploit the intrinsic spatial redundancy of natural images and save computational
Externí odkaz:
http://arxiv.org/abs/2301.03831
Although existing multi-object tracking (MOT) algorithms have obtained competitive performance on various benchmarks, almost all of them train and validate models on the same domain. The domain generalization problem of MOT is hardly studied. To brid
Externí odkaz:
http://arxiv.org/abs/2212.01568
This paper proposes an efficient multi-camera to Bird's-Eye-View (BEV) view transformation method for 3D perception, dubbed MatrixVT. Existing view transformers either suffer from poor transformation efficiency or rely on device-specific operators, h
Externí odkaz:
http://arxiv.org/abs/2211.10593