Zobrazeno 1 - 10
of 532 690
pro vyhledávání: '"Computer science - artificial intelligence"'
Autor:
Zheng, Wenzhao, Wu, Junjie, Zheng, Yao, Zuo, Sicheng, Xie, Zixun, Yang, Longchao, Pan, Yong, Hao, Zhihui, Jia, Peng, Lang, Xianpeng, Zhang, Shanghang
Vision-based autonomous driving shows great potential due to its satisfactory performance and low costs. Most existing methods adopt dense representations (e.g., bird's eye view) or sparse representations (e.g., instance boxes) for decision-making, w
Externí odkaz:
http://arxiv.org/abs/2412.10371
3D occupancy prediction is important for autonomous driving due to its comprehensive perception of the surroundings. To incorporate sequential inputs, most existing methods fuse representations from previous frames to infer the current 3D occupancy.
Externí odkaz:
http://arxiv.org/abs/2412.10373
Autor:
Zohar, Orr, Wang, Xiaohan, Dubois, Yann, Mehta, Nikhil, Xiao, Tong, Hansen-Estruch, Philippe, Yu, Licheng, Wang, Xiaofang, Juefei-Xu, Felix, Zhang, Ning, Yeung-Levy, Serena, Xia, Xide
Despite the rapid integration of video perception capabilities into Large Multimodal Models (LMMs), the underlying mechanisms driving their video understanding remain poorly understood. Consequently, many design decisions in this domain are made with
Externí odkaz:
http://arxiv.org/abs/2412.10360
Autor:
Ren, Yuchen, Han, Wenwei, Zhang, Qianyuan, Tang, Yining, Bai, Weiqiang, Cai, Yuchen, Qiao, Lifeng, Jiang, Hao, Yuan, Dong, Chen, Tao, Sun, Siqi, Tan, Pan, Ouyang, Wanli, Dong, Nanqing, Ma, Xinzhu, Ye, Peng
As key elements within the central dogma, DNA, RNA, and proteins play crucial roles in maintaining life by guaranteeing accurate genetic expression and implementation. Although research on these molecules has profoundly impacted fields like medicine,
Externí odkaz:
http://arxiv.org/abs/2412.10347
Autor:
Kossaifi, Jean, Kovachki, Nikola, Li, Zongyi, Pitt, Davit, Liu-Schiaffini, Miguel, George, Robert Joseph, Bonev, Boris, Azizzadenesheli, Kamyar, Berner, Julius, Anandkumar, Anima
We present NeuralOperator, an open-source Python library for operator learning. Neural operators generalize neural networks to maps between function spaces instead of finite-dimensional Euclidean spaces. They can be trained and inferenced on input an
Externí odkaz:
http://arxiv.org/abs/2412.10354
TraceVLA: Visual Trace Prompting Enhances Spatial-Temporal Awareness for Generalist Robotic Policies
Autor:
Zheng, Ruijie, Liang, Yongyuan, Huang, Shuaiyi, Gao, Jianfeng, Daumé III, Hal, Kolobov, Andrey, Huang, Furong, Yang, Jianwei
Although large vision-language-action (VLA) models pretrained on extensive robot datasets offer promising generalist policies for robotic learning, they still struggle with spatial-temporal dynamics in interactive robotics, making them less effective
Externí odkaz:
http://arxiv.org/abs/2412.10345
In current multimodal tasks, models typically freeze the encoder and decoder while adapting intermediate layers to task-specific goals, such as region captioning. Region-level visual understanding presents significant challenges for large-scale visio
Externí odkaz:
http://arxiv.org/abs/2412.10348
Autor:
Ge, Zhiqi, Li, Juncheng, Pang, Xinglei, Gao, Minghe, Pan, Kaihang, Lin, Wang, Fei, Hao, Zhang, Wenqiao, Tang, Siliang, Zhuang, Yueting
Digital agents are increasingly employed to automate tasks in interactive digital environments such as web pages, software applications, and operating systems. While text-based agents built on Large Language Models (LLMs) often require frequent updat
Externí odkaz:
http://arxiv.org/abs/2412.10342
Autor:
Shanmugam, Divya, Agrawal, Monica, Movva, Rajiv, Chen, Irene Y., Ghassemi, Marzyeh, Pierson, Emma
The increased capabilities of generative AI have dramatically expanded its possible use cases in medicine. We provide a comprehensive overview of generative AI use cases for clinicians, patients, clinical trial organizers, researchers, and trainees.
Externí odkaz:
http://arxiv.org/abs/2412.10337
We study a path planning problem where the possible move actions are represented as a finite set of motion primitives aligned with the grid representation of the environment. That is, each primitive corresponds to a short kinodynamically-feasible mot
Externí odkaz:
http://arxiv.org/abs/2412.10320