Zobrazeno 1 - 10
of 483
pro vyhledávání: '"Yang, Jinrong"'
Autor:
Li, Zhuoling, Ren, Liangliang, Yang, Jinrong, Zhao, Yong, Wu, Xiaoyang, Xu, Zhenhua, Bai, Xiang, Zhao, Hengshuang
Robotic manipulation, owing to its multi-modal nature, often faces significant training ambiguity, necessitating explicit instructions to clearly delineate the manipulation details in tasks. In this work, we highlight that vision instruction is natur
Externí odkaz:
http://arxiv.org/abs/2410.07169
In autonomous driving, multi-modal perception models leveraging inputs from multiple sensors exhibit strong robustness in degraded environments. However, these models face challenges in efficiently and effectively transferring learned representations
Externí odkaz:
http://arxiv.org/abs/2405.17942
Autor:
Wei, Haoran, Kong, Lingyu, Chen, Jinyue, Zhao, Liang, Ge, Zheng, Yang, Jinrong, Sun, Jianjian, Han, Chunrui, Zhang, Xiangyu
Modern Large Vision-Language Models (LVLMs) enjoy the same vision vocabulary -- CLIP, which can cover most common vision tasks. However, for some special vision task that needs dense and fine-grained vision perception, e.g., document-level OCR or cha
Externí odkaz:
http://arxiv.org/abs/2312.06109
Autor:
Yu, En, Zhao, Liang, Wei, Yana, Yang, Jinrong, Wu, Dongming, Kong, Lingyu, Wei, Haoran, Wang, Tiancai, Ge, Zheng, Zhang, Xiangyu, Tao, Wenbing
Humans possess the remarkable ability to foresee the future to a certain extent based on present observations, a skill we term as foresight minds. However, this capability remains largely under explored within existing Multimodal Large Language Model
Externí odkaz:
http://arxiv.org/abs/2312.00589
Autor:
Dong, Runpei, Han, Chunrui, Peng, Yuang, Qi, Zekun, Ge, Zheng, Yang, Jinrong, Zhao, Liang, Sun, Jianjian, Zhou, Hongyu, Wei, Haoran, Kong, Xiangwen, Zhang, Xiangyu, Ma, Kaisheng, Yi, Li
This paper presents DreamLLM, a learning framework that first achieves versatile Multimodal Large Language Models (MLLMs) empowered with frequently overlooked synergy between multimodal comprehension and creation. DreamLLM operates on two fundamental
Externí odkaz:
http://arxiv.org/abs/2309.11499
Detecting glass regions is a challenging task due to the inherent ambiguity in their transparency and reflective characteristics. Current solutions in this field remain rooted in conventional deep learning paradigms, requiring the construction of ann
Externí odkaz:
http://arxiv.org/abs/2307.12018
Autor:
Zhao, Liang, Yu, En, Ge, Zheng, Yang, Jinrong, Wei, Haoran, Zhou, Hongyu, Sun, Jianjian, Peng, Yuang, Dong, Runpei, Han, Chunrui, Zhang, Xiangyu
Human-AI interactivity is a critical aspect that reflects the usability of multimodal large language models (MLLMs). However, existing end-to-end MLLMs only allow users to interact with them through language instructions, leading to the limitation of
Externí odkaz:
http://arxiv.org/abs/2307.09474
Autor:
Li, Zhuoling, Han, Chunrui, Ge, Zheng, Yang, Jinrong, Yu, En, Wang, Haoqian, Zhao, Hengshuang, Zhang, Xiangyu
Efficiency is quite important for 3D lane detection due to practical deployment demand. In this work, we propose a simple, fast, and end-to-end detector that still maintains high detection precision. Specifically, we devise a set of fully convolution
Externí odkaz:
http://arxiv.org/abs/2307.09472
Autor:
Mao, Weixin, Yang, Jinrong, Ge, Zheng, Song, Lin, Zhou, Hongyu, Mao, Tiezheng, Li, Zeming, Yoshie, Osamu
Depth perception is a crucial component of monoc-ular 3D detection tasks that typically involve ill-posed problems. In light of the success of sample mining techniques in 2D object detection, we propose a simple yet effective mining strategy for impr
Externí odkaz:
http://arxiv.org/abs/2306.17450
BEVStereo++: Accurate Depth Estimation in Multi-view 3D Object Detection via Dynamic Temporal Stereo
Bounded by the inherent ambiguity of depth perception, contemporary multi-view 3D object detection methods fall into the performance bottleneck. Intuitively, leveraging temporal multi-view stereo (MVS) technology is the natural knowledge for tackling
Externí odkaz:
http://arxiv.org/abs/2304.04185