Zobrazeno 1 - 10
of 9 499
pro vyhledávání: '"Wenzhao An"'
Autor:
Guo, Xianda, Zhang, Chenming, Zhang, Youmin, Nie, Dujun, Wang, Ruilin, Zheng, Wenzhao, Poggi, Matteo, Chen, Long
Stereo matching has been a pivotal component in 3D vision, aiming to find corresponding points between pairs of stereo images to recover depth information. In this work, we introduce StereoAnything, a highly practical solution for robust stereo match
Externí odkaz:
http://arxiv.org/abs/2411.14053
Autor:
Peng, Chensheng, Zhang, Chengwei, Wang, Yixiao, Xu, Chenfeng, Xie, Yichen, Zheng, Wenzhao, Keutzer, Kurt, Tomizuka, Masayoshi, Zhan, Wei
We present DeSiRe-GS, a self-supervised gaussian splatting representation, enabling effective static-dynamic decomposition and high-fidelity surface reconstruction in complex driving scenarios. Our approach employs a two-stage optimization pipeline o
Externí odkaz:
http://arxiv.org/abs/2411.11921
Autor:
Chen, Anthony, Xu, Jianjin, Zheng, Wenzhao, Dai, Gaole, Wang, Yida, Zhang, Renrui, Wang, Haofan, Zhang, Shanghang
Diffusion models have demonstrated excellent capabilities in text-to-image generation. Their semantic understanding (i.e., prompt following) ability has also been greatly improved with large language models (e.g., T5, Llama). However, existing models
Externí odkaz:
http://arxiv.org/abs/2411.02395
Recent advances in high-definition (HD) map construction from surround-view images have highlighted their cost-effectiveness in deployment. However, prevailing techniques often fall short in accurately extracting and utilizing road features, as well
Externí odkaz:
http://arxiv.org/abs/2411.01408
Autor:
Fei, Xin, Zheng, Wenzhao, Duan, Yueqi, Zhan, Wei, Tomizuka, Masayoshi, Keutzer, Kurt, Lu, Jiwen
We propose PixelGaussian, an efficient feed-forward framework for learning generalizable 3D Gaussian reconstruction from arbitrary views. Most existing methods rely on uniform pixel-wise Gaussian representations, which learn a fixed number of 3D Gaus
Externí odkaz:
http://arxiv.org/abs/2410.18979
Vision-centric autonomous driving has demonstrated excellent performance with economical sensors. As the fundamental step, 3D perception aims to infer 3D information from 2D images based on 3D-2D projection. This makes driving perception models susce
Externí odkaz:
http://arxiv.org/abs/2410.13864
Mamba has garnered widespread attention due to its flexible design and efficient hardware performance to process 1D sequences based on the state space model (SSM). Recent studies have attempted to apply Mamba to the visual domain by flattening 2D ima
Externí odkaz:
http://arxiv.org/abs/2410.10382
Vision mambas have demonstrated strong performance with linear complexity to the number of vision tokens. Their efficiency results from processing image tokens sequentially. However, most existing methods employ patch-based image tokenization and the
Externí odkaz:
http://arxiv.org/abs/2410.10316
Autor:
Zhang, Yuan, Fan, Chun-Kai, Ma, Junpeng, Zheng, Wenzhao, Huang, Tao, Cheng, Kuan, Gudovskiy, Denis, Okuno, Tomoyuki, Nakata, Yohei, Keutzer, Kurt, Zhang, Shanghang
In vision-language models (VLMs), visual tokens usually consume a significant amount of computational overhead, despite their sparser information density compared to text tokens. To address this, most existing methods learn a network to prune redunda
Externí odkaz:
http://arxiv.org/abs/2410.04417
Autor:
Liu, Litao, Wang, Wentao, Han, Yifan, Xie, Zhuoli, Yi, Pengfei, Li, Junyan, Qin, Yi, Lian, Wenzhao
Multi-task imitation learning (MTIL) has shown significant potential in robotic manipulation by enabling agents to perform various tasks using a unified policy. This simplifies the policy deployment and enhances the agent's adaptability across differ
Externí odkaz:
http://arxiv.org/abs/2409.19528