Zobrazeno 1 - 10
of 61
pro vyhledávání: '"Shi, Botian"'
Autor:
Wang, Bin, Xu, Chao, Zhao, Xiaomeng, Ouyang, Linke, Wu, Fan, Zhao, Zhiyuan, Xu, Rui, Liu, Kaiwen, Qu, Yuan, Shang, Fukai, Zhang, Bo, Wei, Liqun, Sui, Zhihao, Li, Wei, Shi, Botian, Qiao, Yu, Lin, Dahua, He, Conghui
Document content analysis has been a crucial research area in computer vision. Despite significant advancements in methods such as OCR, layout detection, and formula recognition, existing open-source solutions struggle to consistently deliver high-qu
Externí odkaz:
http://arxiv.org/abs/2409.18839
Autor:
Mei, Jianbiao, Ma, Yukai, Yang, Xuemeng, Wen, Licheng, Wei, Tiantian, Dou, Min, Shi, Botian, Liu, Yong
Recent advances in diffusion models have significantly enhanced the cotrollable generation of streetscapes for and facilitated downstream perception and planning tasks. However, challenges such as maintaining temporal coherence, generating long video
Externí odkaz:
http://arxiv.org/abs/2409.04003
Autor:
Yang, Xuemeng, Wen, Licheng, Ma, Yukai, Mei, Jianbiao, Li, Xin, Wei, Tiantian, Lei, Wenjie, Fu, Daocheng, Cai, Pinlong, Dou, Min, Shi, Botian, He, Liang, Liu, Yong, Qiao, Yu
This paper presented DriveArena, the first high-fidelity closed-loop simulation system designed for driving agents navigating in real scenarios. DriveArena features a flexible, modular architecture, allowing for the seamless interchange of its core c
Externí odkaz:
http://arxiv.org/abs/2408.00415
Autor:
Ma, Yukai, Mei, Jianbiao, Yang, Xuemeng, Wen, Licheng, Xu, Weihua, Zhang, Jiangning, Shi, Botian, Liu, Yong, Zuo, Xingxing
Semantic Scene Completion (SSC) is pivotal in autonomous driving perception, frequently confronted with the complexities of weather and illumination changes. The long-term strategy involves fusing multi-modal information to bolster the system's robus
Externí odkaz:
http://arxiv.org/abs/2407.16197
Autor:
Xia, Renqiu, Mao, Song, Yan, Xiangchao, Zhou, Hongbin, Zhang, Bo, Peng, Haoyang, Pi, Jiahao, Fu, Daocheng, Wu, Wenjie, Ye, Hancheng, Feng, Shiyang, Wang, Bin, Xu, Chao, He, Conghui, Cai, Pinlong, Dou, Min, Shi, Botian, Zhou, Sheng, Wang, Yongwei, Yan, Junchi, Wu, Fei, Qiao, Yu
Scientific documents record research findings and valuable human knowledge, comprising a vast corpus of high-quality data. Leveraging multi-modality data extracted from these documents and assessing large models' abilities to handle scientific docume
Externí odkaz:
http://arxiv.org/abs/2406.11633
Autor:
Li, Qingyun, Chen, Zhe, Wang, Weiyun, Wang, Wenhai, Ye, Shenglong, Jin, Zhenjiang, Chen, Guanzhou, He, Yinan, Gao, Zhangwei, Cui, Erfei, Yu, Jiashuo, Tian, Hao, Zhou, Jiasheng, Xu, Chao, Wang, Bin, Wei, Xingjian, Li, Wei, Zhang, Wenjian, Zhang, Bo, Cai, Pinlong, Wen, Licheng, Yan, Xiangchao, Li, Zhenxiang, Chu, Pei, Wang, Yi, Dou, Min, Tian, Changyao, Zhu, Xizhou, Lu, Lewei, Chen, Yushi, He, Junjun, Tu, Zhongying, Lu, Tong, Wang, Yali, Wang, Limin, Lin, Dahua, Qiao, Yu, Shi, Botian, He, Conghui, Dai, Jifeng
Image-text interleaved data, consisting of multiple images and texts arranged in a natural document format, aligns with the presentation paradigm of internet data and closely resembles human reading habits. Recent studies have shown that such data ai
Externí odkaz:
http://arxiv.org/abs/2406.08418
Autor:
Mei, Jianbiao, Ma, Yukai, Yang, Xuemeng, Wen, Licheng, Cai, Xinyu, Li, Xin, Fu, Daocheng, Zhang, Bo, Cai, Pinlong, Dou, Min, Shi, Botian, He, Liang, Liu, Yong, Qiao, Yu
Autonomous driving has advanced significantly due to sensors, machine learning, and artificial intelligence improvements. However, prevailing methods struggle with intricate scenarios and causal relationships, hindering adaptability and interpretabil
Externí odkaz:
http://arxiv.org/abs/2405.15324
Autor:
Zhu, Zheng, Wang, Xiaofeng, Zhao, Wangbo, Min, Chen, Deng, Nianchen, Dou, Min, Wang, Yuqi, Shi, Botian, Wang, Kai, Zhang, Chi, You, Yang, Zhang, Zhaoxiang, Zhao, Dawei, Xiao, Liang, Zhao, Jian, Lu, Jiwen, Huang, Guan
General world models represent a crucial pathway toward achieving Artificial General Intelligence (AGI), serving as the cornerstone for various applications ranging from virtual environments to decision-making systems. Recently, the emergence of the
Externí odkaz:
http://arxiv.org/abs/2405.03520
Publikováno v:
Data Intelligence, Vol 1, Iss 3, Pp 238-270 (2019)
Knowlege is important for text-related applications. In this paper, we introduce Microsoft Concept Graph, a knowledge graph engine that provides concept tagging APIs to facilitate the understanding of human languages. Microsoft Concept Graph is built
Externí odkaz:
https://doaj.org/article/d497b5fa0aa443979e5ec843027e877c
Autor:
Chen, Zhe, Wang, Weiyun, Tian, Hao, Ye, Shenglong, Gao, Zhangwei, Cui, Erfei, Tong, Wenwen, Hu, Kongzhi, Luo, Jiapeng, Ma, Zheng, Ma, Ji, Wang, Jiaqi, Dong, Xiaoyi, Yan, Hang, Guo, Hewei, He, Conghui, Shi, Botian, Jin, Zhenjiang, Xu, Chao, Wang, Bin, Wei, Xingjian, Li, Wei, Zhang, Wenjian, Zhang, Bo, Cai, Pinlong, Wen, Licheng, Yan, Xiangchao, Dou, Min, Lu, Lewei, Zhu, Xizhou, Lu, Tong, Lin, Dahua, Qiao, Yu, Dai, Jifeng, Wang, Wenhai
In this report, we introduce InternVL 1.5, an open-source multimodal large language model (MLLM) to bridge the capability gap between open-source and proprietary commercial models in multimodal understanding. We introduce three simple improvements: (
Externí odkaz:
http://arxiv.org/abs/2404.16821