Zobrazeno 1 - 10
of 267
pro vyhledávání: '"Lin, Tianwei"'
In embodied intelligence systems, a key component is 3D perception algorithm, which enables agents to understand their surrounding environments. Previous algorithms primarily rely on point cloud, which, despite offering precise geometric information,
Externí odkaz:
http://arxiv.org/abs/2411.14869
Autor:
Lin, Tianwei, Liu, Jiang, Zhang, Wenqiao, Li, Zhaocheng, Dai, Yang, Li, Haoyuan, Yu, Zhelun, He, Wanggui, Li, Juncheng, Jiang, Hao, Tang, Siliang, Zhuang, Yueting
While Parameter-Efficient Fine-Tuning (PEFT) methods like LoRA have effectively addressed GPU memory constraints during fine-tuning, their performance often falls short, especially in multidimensional task scenarios. To address this issue, one straig
Externí odkaz:
http://arxiv.org/abs/2408.09856
Autor:
Zhang, Wenqiao, Lin, Tianwei, Liu, Jiang, Shu, Fangxun, Li, Haoyuan, Zhang, Lei, Wanggui, He, Zhou, Hao, Lv, Zheqi, Jiang, Hao, Li, Juncheng, Tang, Siliang, Zhuang, Yueting
Recent advancements indicate that scaling up Multimodal Large Language Models (MLLMs) effectively enhances performance on downstream multimodal tasks. The prevailing MLLM paradigm, \emph{e.g.}, LLaVA, transforms visual features into text-like tokens
Externí odkaz:
http://arxiv.org/abs/2403.13447
We present WidthFormer, a novel transformer-based module to compute Bird's-Eye-View (BEV) representations from multi-view cameras for real-time autonomous-driving applications. WidthFormer is computationally efficient, robust and does not require any
Externí odkaz:
http://arxiv.org/abs/2401.03836
Autor:
Guo, Qin, Lin, Tianwei
Recently, diffusion-based methods, like InstructPix2Pix (IP2P), have achieved effective instruction-based image editing, requiring only natural language instructions from the user. However, these methods often inadvertently alter unintended areas and
Externí odkaz:
http://arxiv.org/abs/2312.10113
Motion prediction is a crucial task in autonomous driving, and one of its major challenges lands in the multimodality of future behaviors. Many successful works have utilized mixture models which require identification of positive mixture components,
Externí odkaz:
http://arxiv.org/abs/2312.09501
In autonomous driving perception systems, 3D detection and tracking are the two fundamental tasks. This paper delves deeper into this field, building upon the Sparse4D framework. We introduce two auxiliary training tasks (Temporal Instance Denoising
Externí odkaz:
http://arxiv.org/abs/2311.11722
Autor:
Jiang, Haoyi, Cheng, Tianheng, Gao, Naiyu, Zhang, Haoyang, Lin, Tianwei, Liu, Wenyu, Wang, Xinggang
`3D Semantic Scene Completion (SSC) has emerged as a nascent and pivotal undertaking in autonomous driving, aiming to predict voxel occupancy within volumetric scenes. However, prevailing methodologies primarily focus on voxel-wise feature aggregatio
Externí odkaz:
http://arxiv.org/abs/2306.15670
Augmenting LiDAR input with multiple previous frames provides richer semantic information and thus boosts performance in 3D object detection, However, crowded point clouds in multi-frames can hurt the precise position information due to the motion bl
Externí odkaz:
http://arxiv.org/abs/2305.15219
Sparse algorithms offer great flexibility for multi-view temporal perception tasks. In this paper, we present an enhanced version of Sparse4D, in which we improve the temporal fusion module by implementing a recursive form of multi-frame feature samp
Externí odkaz:
http://arxiv.org/abs/2305.14018