Výsledky vyhledávání - "Yang, Jinrong"

Report

VIRT: Vision Instructed Transformer for Robotic Manipulation

Autor: Li, Zhuoling, Ren, Liangliang, Yang, Jinrong, Zhao, Yong, Wu, Xiaoyang, Xu, Zhenhua, Bai, Xiang, Zhao, Hengshuang

Robotic manipulation, owing to its multi-modal nature, often faces significant training ambiguity, necessitating explicit instructions to clearly delineate the manipulation details in tasks. In this work, we highlight that vision instruction is natur

Externí odkaz: http://arxiv.org/abs/2410.07169

Zobrazit plný text záznamu

Report

Self-supervised Pre-training for Transferable Multi-modal Perception

Autor: Xu, Xiaohao, Zhang, Tianyi, Yang, Jinrong, Johnson-Roberson, Matthew, Huang, Xiaonan

In autonomous driving, multi-modal perception models leveraging inputs from multiple sensors exhibit strong robustness in degraded environments. However, these models face challenges in efficiently and effectively transferring learned representations

Externí odkaz: http://arxiv.org/abs/2405.17942

Zobrazit plný text záznamu

Report

Vary: Scaling up the Vision Vocabulary for Large Vision-Language Models

Autor: Wei, Haoran, Kong, Lingyu, Chen, Jinyue, Zhao, Liang, Ge, Zheng, Yang, Jinrong, Sun, Jianjian, Han, Chunrui, Zhang, Xiangyu

Modern Large Vision-Language Models (LVLMs) enjoy the same vision vocabulary -- CLIP, which can cover most common vision tasks. However, for some special vision task that needs dense and fine-grained vision perception, e.g., document-level OCR or cha

Externí odkaz: http://arxiv.org/abs/2312.06109

Zobrazit plný text záznamu

Report

Merlin:Empowering Multimodal LLMs with Foresight Minds

Autor: Yu, En, Zhao, Liang, Wei, Yana, Yang, Jinrong, Wu, Dongming, Kong, Lingyu, Wei, Haoran, Wang, Tiancai, Ge, Zheng, Zhang, Xiangyu, Tao, Wenbing

Humans possess the remarkable ability to foresee the future to a certain extent based on present observations, a skill we term as foresight minds. However, this capability remains largely under explored within existing Multimodal Large Language Model

Externí odkaz: http://arxiv.org/abs/2312.00589

Zobrazit plný text záznamu

Report

DreamLLM: Synergistic Multimodal Comprehension and Creation

Autor: Dong, Runpei, Han, Chunrui, Peng, Yuang, Qi, Zekun, Ge, Zheng, Yang, Jinrong, Zhao, Liang, Sun, Jianjian, Zhou, Hongyu, Wei, Haoran, Kong, Xiangwen, Zhang, Xiangyu, Ma, Kaisheng, Yi, Li

This paper presents DreamLLM, a learning framework that first achieves versatile Multimodal Large Language Models (MLLMs) empowered with frequently overlooked synergy between multimodal comprehension and creation. DreamLLM operates on two fundamental

Externí odkaz: http://arxiv.org/abs/2309.11499

Zobrazit plný text záznamu

Report

GEM: Boost Simple Network for Glass Surface Segmentation via Vision Foundation Models

Autor: Hao, Jing, Liu, Moyun, Yang, Jinrong, Hung, Kuo Feng

Detecting glass regions is a challenging task due to the inherent ambiguity in their transparency and reflective characteristics. Current solutions in this field remain rooted in conventional deep learning paradigms, requiring the construction of ann

Externí odkaz: http://arxiv.org/abs/2307.12018

Zobrazit plný text záznamu

Report

ChatSpot: Bootstrapping Multimodal LLMs via Precise Referring Instruction Tuning

Autor: Zhao, Liang, Yu, En, Ge, Zheng, Yang, Jinrong, Wei, Haoran, Zhou, Hongyu, Sun, Jianjian, Peng, Yuang, Dong, Runpei, Han, Chunrui, Zhang, Xiangyu

Human-AI interactivity is a critical aspect that reflects the usability of multimodal large language models (MLLMs). However, existing end-to-end MLLMs only allow users to interact with them through language instructions, leading to the limitation of

Externí odkaz: http://arxiv.org/abs/2307.09474

Zobrazit plný text záznamu

Report

GroupLane: End-to-End 3D Lane Detection with Channel-wise Grouping

Autor: Li, Zhuoling, Han, Chunrui, Ge, Zheng, Yang, Jinrong, Yu, En, Wang, Haoqian, Zhao, Hengshuang, Zhang, Xiangyu

Efficiency is quite important for 3D lane detection due to practical deployment demand. In this work, we propose a simple, fast, and end-to-end detector that still maintains high detection precision. Specifically, we devise a set of fully convolution

Externí odkaz: http://arxiv.org/abs/2307.09472

Zobrazit plný text záznamu

Report

GMM: Delving into Gradient Aware and Model Perceive Depth Mining for Monocular 3D Detection

Autor: Mao, Weixin, Yang, Jinrong, Ge, Zheng, Song, Lin, Zhou, Hongyu, Mao, Tiezheng, Li, Zeming, Yoshie, Osamu

Depth perception is a crucial component of monoc-ular 3D detection tasks that typically involve ill-posed problems. In light of the success of sample mining techniques in 2D object detection, we propose a simple yet effective mining strategy for impr

Externí odkaz: http://arxiv.org/abs/2306.17450

Zobrazit plný text záznamu

Report

BEVStereo++: Accurate Depth Estimation in Multi-view 3D Object Detection via Dynamic Temporal Stereo

Autor: Li, Yinhao, Yang, Jinrong, Sun, Jianjian, Bao, Han, Ge, Zheng, Xiao, Li

Bounded by the inherent ambiguity of depth perception, contemporary multi-view 3D object detection methods fall into the performance bottleneck. Intuitively, leveraging temporal multi-view stereo (MVS) technology is the natural knowledge for tackling

Externí odkaz: http://arxiv.org/abs/2304.04185

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání