Výsledky vyhledávání - "Wang, Weiyun"

Report

ChemVLM: Exploring the Power of Multimodal Large Language Models in Chemistry Area

Autor: Li, Junxian, Zhang, Di, Wang, Xunzhi, Hao, Zeying, Lei, Jingdi, Tan, Qian, Zhou, Cai, Liu, Wei, Yang, Yaotian, Xiong, Xinrui, Wang, Weiyun, Chen, Zhe, Wang, Wenhai, Li, Wei, Zhang, Shufei, Su, Mao, Ouyang, Wanli, Li, Yuqiang, Zhou, Dongzhan

Large Language Models (LLMs) have achieved remarkable success and have been applied across various scientific fields, including chemistry. However, many chemical tasks require the processing of visual information, which cannot be successfully handled

Externí odkaz: http://arxiv.org/abs/2408.07246

Zobrazit plný text záznamu

Report

MMInstruct: A High-Quality Multi-Modal Instruction Tuning Dataset with Extensive Diversity

Autor: Liu, Yangzhou, Cao, Yue, Gao, Zhangwei, Wang, Weiyun, Chen, Zhe, Wang, Wenhai, Tian, Hao, Lu, Lewei, Zhu, Xizhou, Lu, Tong, Qiao, Yu, Dai, Jifeng

Despite the effectiveness of vision-language supervised fine-tuning in enhancing the performance of Vision Large Language Models (VLLMs). However, existing visual instruction tuning datasets include the following limitations: (1) Instruction annotati

Externí odkaz: http://arxiv.org/abs/2407.15838

Zobrazit plný text záznamu

Report

OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text

Image-text interleaved data, consisting of multiple images and texts arranged in a natural document format, aligns with the presentation paradigm of internet data and closely resembles human reading habits. Recent studies have shown that such data ai

Externí odkaz: http://arxiv.org/abs/2406.08418

Zobrazit plný text záznamu

Report

Needle In A Multimodal Haystack

Autor: Wang, Weiyun, Zhang, Shuibo, Ren, Yiming, Duan, Yuchen, Li, Tiantong, Liu, Shuo, Hu, Mengkang, Chen, Zhe, Zhang, Kaipeng, Lu, Lewei, Zhu, Xizhou, Luo, Ping, Qiao, Yu, Dai, Jifeng, Shao, Wenqi, Wang, Wenhai

With the rapid advancement of multimodal large language models (MLLMs), their evaluation has become increasingly comprehensive. However, understanding long multimodal content, as a foundational ability for real-world applications, remains underexplor

Externí odkaz: http://arxiv.org/abs/2406.07230

Zobrazit plný text záznamu

Report

How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites

In this report, we introduce InternVL 1.5, an open-source multimodal large language model (MLLM) to bridge the capability gap between open-source and proprietary commercial models in multimodal understanding. We introduce three simple improvements: (

Externí odkaz: http://arxiv.org/abs/2404.16821

Zobrazit plný text záznamu

Report

Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures

Autor: Duan, Yuchen, Wang, Weiyun, Chen, Zhe, Zhu, Xizhou, Lu, Lewei, Lu, Tong, Qiao, Yu, Li, Hongsheng, Dai, Jifeng, Wang, Wenhai

Transformers have revolutionized computer vision and natural language processing, but their high computational complexity limits their application in high-resolution image processing and long-context analysis. This paper introduces Vision-RWKV (VRWKV

Externí odkaz: http://arxiv.org/abs/2403.02308

Zobrazit plný text záznamu

Report

The All-Seeing Project V2: Towards General Relation Comprehension of the Open World

Autor: Wang, Weiyun, Ren, Yiming, Luo, Haowen, Li, Tiantong, Yan, Chenxiang, Chen, Zhe, Wang, Wenhai, Li, Qingyun, Lu, Lewei, Zhu, Xizhou, Qiao, Yu, Dai, Jifeng

We present the All-Seeing Project V2: a new model and dataset designed for understanding object relations in images. Specifically, we propose the All-Seeing Model V2 (ASMv2) that integrates the formulation of text generation, object localization, and

Externí odkaz: http://arxiv.org/abs/2402.19474

Zobrazit plný text záznamu

Report

MM-Interleaved: Interleaved Image-Text Generative Modeling via Multi-modal Feature Synchronizer

Autor: Tian, Changyao, Zhu, Xizhou, Xiong, Yuwen, Wang, Weiyun, Chen, Zhe, Wang, Wenhai, Chen, Yuntao, Lu, Lewei, Lu, Tong, Zhou, Jie, Li, Hongsheng, Qiao, Yu, Dai, Jifeng

Developing generative models for interleaved image-text data has both research and practical value. It requires models to understand the interleaved sequences and subsequently generate images and text. However, existing attempts are limited by the is

Externí odkaz: http://arxiv.org/abs/2401.10208

Zobrazit plný text záznamu

Report

The All-Seeing Project: Towards Panoptic Visual Recognition and Understanding of the Open World

Autor: Wang, Weiyun, Shi, Min, Li, Qingyun, Wang, Wenhai, Huang, Zhenhang, Xing, Linjie, Chen, Zhe, Li, Hao, Zhu, Xizhou, Cao, Zhiguo, Chen, Yushi, Lu, Tong, Dai, Jifeng, Qiao, Yu

We present the All-Seeing (AS) project: a large-scale data and model for recognizing and understanding everything in the open world. Using a scalable data engine that incorporates human feedback and efficient models in the loop, we create a new datas

Externí odkaz: http://arxiv.org/abs/2308.01907

Zobrazit plný text záznamu

Report

InternGPT: Solving Vision-Centric Tasks by Interacting with ChatGPT Beyond Language

We present an interactive visual framework named InternGPT, or iGPT for short. The framework integrates chatbots that have planning and reasoning capabilities, such as ChatGPT, with non-verbal instructions like pointing movements that enable users to

Externí odkaz: http://arxiv.org/abs/2305.05662

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání