Zobrazeno 1 - 10
of 357
pro vyhledávání: '"Wang, Weiyun"'
Autor:
Li, Junxian, Zhang, Di, Wang, Xunzhi, Hao, Zeying, Lei, Jingdi, Tan, Qian, Zhou, Cai, Liu, Wei, Yang, Yaotian, Xiong, Xinrui, Wang, Weiyun, Chen, Zhe, Wang, Wenhai, Li, Wei, Zhang, Shufei, Su, Mao, Ouyang, Wanli, Li, Yuqiang, Zhou, Dongzhan
Large Language Models (LLMs) have achieved remarkable success and have been applied across various scientific fields, including chemistry. However, many chemical tasks require the processing of visual information, which cannot be successfully handled
Externí odkaz:
http://arxiv.org/abs/2408.07246
Autor:
Liu, Yangzhou, Cao, Yue, Gao, Zhangwei, Wang, Weiyun, Chen, Zhe, Wang, Wenhai, Tian, Hao, Lu, Lewei, Zhu, Xizhou, Lu, Tong, Qiao, Yu, Dai, Jifeng
Despite the effectiveness of vision-language supervised fine-tuning in enhancing the performance of Vision Large Language Models (VLLMs). However, existing visual instruction tuning datasets include the following limitations: (1) Instruction annotati
Externí odkaz:
http://arxiv.org/abs/2407.15838
Autor:
Li, Qingyun, Chen, Zhe, Wang, Weiyun, Wang, Wenhai, Ye, Shenglong, Jin, Zhenjiang, Chen, Guanzhou, He, Yinan, Gao, Zhangwei, Cui, Erfei, Yu, Jiashuo, Tian, Hao, Zhou, Jiasheng, Xu, Chao, Wang, Bin, Wei, Xingjian, Li, Wei, Zhang, Wenjian, Zhang, Bo, Cai, Pinlong, Wen, Licheng, Yan, Xiangchao, Li, Zhenxiang, Chu, Pei, Wang, Yi, Dou, Min, Tian, Changyao, Zhu, Xizhou, Lu, Lewei, Chen, Yushi, He, Junjun, Tu, Zhongying, Lu, Tong, Wang, Yali, Wang, Limin, Lin, Dahua, Qiao, Yu, Shi, Botian, He, Conghui, Dai, Jifeng
Image-text interleaved data, consisting of multiple images and texts arranged in a natural document format, aligns with the presentation paradigm of internet data and closely resembles human reading habits. Recent studies have shown that such data ai
Externí odkaz:
http://arxiv.org/abs/2406.08418
Autor:
Wang, Weiyun, Zhang, Shuibo, Ren, Yiming, Duan, Yuchen, Li, Tiantong, Liu, Shuo, Hu, Mengkang, Chen, Zhe, Zhang, Kaipeng, Lu, Lewei, Zhu, Xizhou, Luo, Ping, Qiao, Yu, Dai, Jifeng, Shao, Wenqi, Wang, Wenhai
With the rapid advancement of multimodal large language models (MLLMs), their evaluation has become increasingly comprehensive. However, understanding long multimodal content, as a foundational ability for real-world applications, remains underexplor
Externí odkaz:
http://arxiv.org/abs/2406.07230
Autor:
Chen, Zhe, Wang, Weiyun, Tian, Hao, Ye, Shenglong, Gao, Zhangwei, Cui, Erfei, Tong, Wenwen, Hu, Kongzhi, Luo, Jiapeng, Ma, Zheng, Ma, Ji, Wang, Jiaqi, Dong, Xiaoyi, Yan, Hang, Guo, Hewei, He, Conghui, Shi, Botian, Jin, Zhenjiang, Xu, Chao, Wang, Bin, Wei, Xingjian, Li, Wei, Zhang, Wenjian, Zhang, Bo, Cai, Pinlong, Wen, Licheng, Yan, Xiangchao, Dou, Min, Lu, Lewei, Zhu, Xizhou, Lu, Tong, Lin, Dahua, Qiao, Yu, Dai, Jifeng, Wang, Wenhai
In this report, we introduce InternVL 1.5, an open-source multimodal large language model (MLLM) to bridge the capability gap between open-source and proprietary commercial models in multimodal understanding. We introduce three simple improvements: (
Externí odkaz:
http://arxiv.org/abs/2404.16821
Autor:
Duan, Yuchen, Wang, Weiyun, Chen, Zhe, Zhu, Xizhou, Lu, Lewei, Lu, Tong, Qiao, Yu, Li, Hongsheng, Dai, Jifeng, Wang, Wenhai
Transformers have revolutionized computer vision and natural language processing, but their high computational complexity limits their application in high-resolution image processing and long-context analysis. This paper introduces Vision-RWKV (VRWKV
Externí odkaz:
http://arxiv.org/abs/2403.02308
Autor:
Wang, Weiyun, Ren, Yiming, Luo, Haowen, Li, Tiantong, Yan, Chenxiang, Chen, Zhe, Wang, Wenhai, Li, Qingyun, Lu, Lewei, Zhu, Xizhou, Qiao, Yu, Dai, Jifeng
We present the All-Seeing Project V2: a new model and dataset designed for understanding object relations in images. Specifically, we propose the All-Seeing Model V2 (ASMv2) that integrates the formulation of text generation, object localization, and
Externí odkaz:
http://arxiv.org/abs/2402.19474
Autor:
Tian, Changyao, Zhu, Xizhou, Xiong, Yuwen, Wang, Weiyun, Chen, Zhe, Wang, Wenhai, Chen, Yuntao, Lu, Lewei, Lu, Tong, Zhou, Jie, Li, Hongsheng, Qiao, Yu, Dai, Jifeng
Developing generative models for interleaved image-text data has both research and practical value. It requires models to understand the interleaved sequences and subsequently generate images and text. However, existing attempts are limited by the is
Externí odkaz:
http://arxiv.org/abs/2401.10208
Autor:
Wang, Weiyun, Shi, Min, Li, Qingyun, Wang, Wenhai, Huang, Zhenhang, Xing, Linjie, Chen, Zhe, Li, Hao, Zhu, Xizhou, Cao, Zhiguo, Chen, Yushi, Lu, Tong, Dai, Jifeng, Qiao, Yu
We present the All-Seeing (AS) project: a large-scale data and model for recognizing and understanding everything in the open world. Using a scalable data engine that incorporates human feedback and efficient models in the loop, we create a new datas
Externí odkaz:
http://arxiv.org/abs/2308.01907
Autor:
Liu, Zhaoyang, He, Yinan, Wang, Wenhai, Wang, Weiyun, Wang, Yi, Chen, Shoufa, Zhang, Qinglong, Lai, Zeqiang, Yang, Yang, Li, Qingyun, Yu, Jiashuo, Li, Kunchang, Chen, Zhe, Yang, Xue, Zhu, Xizhou, Wang, Yali, Wang, Limin, Luo, Ping, Dai, Jifeng, Qiao, Yu
We present an interactive visual framework named InternGPT, or iGPT for short. The framework integrates chatbots that have planning and reasoning capabilities, such as ChatGPT, with non-verbal instructions like pointing movements that enable users to
Externí odkaz:
http://arxiv.org/abs/2305.05662