Výsledky vyhledávání

Report

Stereo Anything: Unifying Stereo Matching with Large-Scale Mixed Data

Autor: Guo, Xianda, Zhang, Chenming, Zhang, Youmin, Nie, Dujun, Wang, Ruilin, Zheng, Wenzhao, Poggi, Matteo, Chen, Long

Stereo matching has been a pivotal component in 3D vision, aiming to find corresponding points between pairs of stereo images to recover depth information. In this work, we introduce StereoAnything, a highly practical solution for robust stereo match

Externí odkaz: http://arxiv.org/abs/2411.14053

Zobrazit plný text záznamu

Report

DeSiRe-GS: 4D Street Gaussians for Static-Dynamic Decomposition and Surface Reconstruction for Urban Driving Scenes

Autor: Peng, Chensheng, Zhang, Chengwei, Wang, Yixiao, Xu, Chenfeng, Xie, Yichen, Zheng, Wenzhao, Keutzer, Kurt, Tomizuka, Masayoshi, Zhan, Wei

We present DeSiRe-GS, a self-supervised gaussian splatting representation, enabling effective static-dynamic decomposition and high-fidelity surface reconstruction in complex driving scenarios. Our approach employs a two-stage optimization pipeline o

Externí odkaz: http://arxiv.org/abs/2411.11921

Zobrazit plný text záznamu

Report

Training-free Regional Prompting for Diffusion Transformers

Autor: Chen, Anthony, Xu, Jianjin, Zheng, Wenzhao, Dai, Gaole, Wang, Yida, Zhang, Renrui, Wang, Haofan, Zhang, Shanghang

Diffusion models have demonstrated excellent capabilities in text-to-image generation. Their semantic understanding (i.e., prompt following) ability has also been greatly improved with large language models (e.g., T5, Llama). However, existing models

Externí odkaz: http://arxiv.org/abs/2411.02395

Zobrazit plný text záznamu

Report

HeightMapNet: Explicit Height Modeling for End-to-End HD Map Learning

Autor: Qiu, Wenzhao, Pang, Shanmin, zhang, Hao, Fang, Jianwu, Xue, Jianru

Recent advances in high-definition (HD) map construction from surround-view images have highlighted their cost-effectiveness in deployment. However, prevailing techniques often fall short in accurately extracting and utilizing road features, as well

Externí odkaz: http://arxiv.org/abs/2411.01408

Zobrazit plný text záznamu

Report

PixelGaussian: Generalizable 3D Gaussian Reconstruction from Arbitrary Views

Autor: Fei, Xin, Zheng, Wenzhao, Duan, Yueqi, Zhan, Wei, Tomizuka, Masayoshi, Keutzer, Kurt, Lu, Jiwen

We propose PixelGaussian, an efficient feed-forward framework for learning generalizable 3D Gaussian reconstruction from arbitrary views. Most existing methods rely on uniform pixel-wise Gaussian representations, which learn a fixed number of 3D Gaus

Externí odkaz: http://arxiv.org/abs/2410.18979

Zobrazit plný text záznamu

Report

UniDrive: Towards Universal Driving Perception Across Camera Configurations

Autor: Li, Ye, Zheng, Wenzhao, Huang, Xiaonan, Keutzer, Kurt

Vision-centric autonomous driving has demonstrated excellent performance with economical sensors. As the fundamental step, 3D perception aims to infer 3D information from 2D images based on 3D-2D projection. This makes driving perception models susce

Externí odkaz: http://arxiv.org/abs/2410.13864

Zobrazit plný text záznamu

Report

V2M: Visual 2-Dimensional Mamba for Image Representation Learning

Autor: Wang, Chengkun, Zheng, Wenzhao, Huang, Yuanhui, Zhou, Jie, Lu, Jiwen

Mamba has garnered widespread attention due to its flexible design and efficient hardware performance to process 1D sequences based on the state space model (SSM). Recent studies have attempted to apply Mamba to the visual domain by flattening 2D ima

Externí odkaz: http://arxiv.org/abs/2410.10382

Zobrazit plný text záznamu

Report

GlobalMamba: Global Image Serialization for Vision Mamba

Autor: Wang, Chengkun, Zheng, Wenzhao, Zhou, Jie, Lu, Jiwen

Vision mambas have demonstrated strong performance with linear complexity to the number of vision tokens. Their efficiency results from processing image tokens sequentially. However, most existing methods employ patch-based image tokenization and the

Externí odkaz: http://arxiv.org/abs/2410.10316

Zobrazit plný text záznamu

Report

SparseVLM: Visual Token Sparsification for Efficient Vision-Language Model Inference

Autor: Zhang, Yuan, Fan, Chun-Kai, Ma, Junpeng, Zheng, Wenzhao, Huang, Tao, Cheng, Kuan, Gudovskiy, Denis, Okuno, Tomoyuki, Nakata, Yohei, Keutzer, Kurt, Zhang, Shanghang

In vision-language models (VLMs), visual tokens usually consume a significant amount of computational overhead, despite their sparser information density compared to text tokens. To address this, most existing methods learn a network to prune redunda

Externí odkaz: http://arxiv.org/abs/2410.04417

Zobrazit plný text záznamu

Report

FoAM: Foresight-Augmented Multi-Task Imitation Policy for Robotic Manipulation

Autor: Liu, Litao, Wang, Wentao, Han, Yifan, Xie, Zhuoli, Yi, Pengfei, Li, Junyan, Qin, Yi, Lian, Wenzhao

Multi-task imitation learning (MTIL) has shown significant potential in robotic manipulation by enabling agents to perform various tasks using a unified policy. This simplifies the policy deployment and enhances the agent's adaptability across differ

Externí odkaz: http://arxiv.org/abs/2409.19528

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání