Zobrazeno 1 - 10
of 23
pro vyhledávání: '"Ye, Jiabo"'
Autor:
Hu, Anwen, Xu, Haiyang, Zhang, Liang, Ye, Jiabo, Yan, Ming, Zhang, Ji, Jin, Qin, Huang, Fei, Zhou, Jingren
Multimodel Large Language Models(MLLMs) have achieved promising OCR-free Document Understanding performance by increasing the supported resolution of document images. However, this comes at the cost of generating thousands of visual tokens for a sing
Externí odkaz:
http://arxiv.org/abs/2409.03420
Autor:
Ye, Jiabo, Xu, Haiyang, Liu, Haowei, Hu, Anwen, Yan, Ming, Qian, Qi, Zhang, Ji, Huang, Fei, Zhou, Jingren
Multi-modal Large Language Models (MLLMs) have demonstrated remarkable capabilities in executing instructions for a variety of single-image tasks. Despite this progress, significant challenges remain in modeling long image sequences. In this work, we
Externí odkaz:
http://arxiv.org/abs/2408.04840
Autor:
Hu, Anwen, Xu, Haiyang, Ye, Jiabo, Yan, Ming, Zhang, Liang, Zhang, Bo, Li, Chen, Zhang, Ji, Jin, Qin, Huang, Fei, Zhou, Jingren
Structure information is critical for understanding the semantics of text-rich images, such as documents, tables, and charts. Existing Multimodal Large Language Models (MLLMs) for Visual Document Understanding are equipped with text recognition abili
Externí odkaz:
http://arxiv.org/abs/2403.12895
Autor:
Wang, Junyang, Xu, Haiyang, Ye, Jiabo, Yan, Ming, Shen, Weizhou, Zhang, Ji, Huang, Fei, Sang, Jitao
Mobile device agent based on Multimodal Large Language Models (MLLM) is becoming a popular application. In this paper, we introduce Mobile-Agent, an autonomous multi-modal mobile device agent. Mobile-Agent first leverages visual perception tools to a
Externí odkaz:
http://arxiv.org/abs/2401.16158
Autor:
Hu, Anwen, Shi, Yaya, Xu, Haiyang, Ye, Jiabo, Ye, Qinghao, Yan, Ming, Li, Chenliang, Qian, Qi, Zhang, Ji, Huang, Fei
Recently, the strong text creation ability of Large Language Models(LLMs) has given rise to many tools for assisting paper reading or even writing. However, the weak diagram analysis abilities of LLMs or Multimodal LLMs greatly limit their applicatio
Externí odkaz:
http://arxiv.org/abs/2311.18248
Autor:
Ye, Qinghao, Xu, Haiyang, Ye, Jiabo, Yan, Ming, Hu, Anwen, Liu, Haowei, Qian, Qi, Zhang, Ji, Huang, Fei, Zhou, Jingren
Multi-modal Large Language Models (MLLMs) have demonstrated impressive instruction abilities across various open-ended tasks. However, previous methods primarily focus on enhancing multi-modal capabilities. In this work, we introduce a versatile mult
Externí odkaz:
http://arxiv.org/abs/2311.04257
Autor:
Ye, Jiabo, Hu, Anwen, Xu, Haiyang, Ye, Qinghao, Yan, Ming, Xu, Guohai, Li, Chenliang, Tian, Junfeng, Qian, Qi, Zhang, Ji, Jin, Qin, He, Liang, Lin, Xin Alex, Huang, Fei
Text is ubiquitous in our visual world, conveying crucial information, such as in documents, websites, and everyday photographs. In this work, we propose UReader, a first exploration of universal OCR-free visually-situated language understanding base
Externí odkaz:
http://arxiv.org/abs/2310.05126
Autor:
Ye, Jiabo, Hu, Anwen, Xu, Haiyang, Ye, Qinghao, Yan, Ming, Dan, Yuhao, Zhao, Chenlin, Xu, Guohai, Li, Chenliang, Tian, Junfeng, Qi, Qian, Zhang, Ji, Huang, Fei
Document understanding refers to automatically extract, analyze and comprehend information from various types of digital documents, such as a web page. Existing Multi-model Large Language Models (MLLMs), including mPLUG-Owl, have demonstrated promisi
Externí odkaz:
http://arxiv.org/abs/2307.02499
Youku-mPLUG: A 10 Million Large-scale Chinese Video-Language Dataset for Pre-training and Benchmarks
Autor:
Xu, Haiyang, Ye, Qinghao, Wu, Xuan, Yan, Ming, Miao, Yuan, Ye, Jiabo, Xu, Guohai, Hu, Anwen, Shi, Yaya, Xu, Guangwei, Li, Chenliang, Qian, Qi, Que, Maofei, Zhang, Ji, Zeng, Xiao, Huang, Fei
To promote the development of Vision-Language Pre-training (VLP) and multimodal Large Language Model (LLM) in the Chinese community, we firstly release the largest public Chinese high-quality video-language dataset named Youku-mPLUG, which is collect
Externí odkaz:
http://arxiv.org/abs/2306.04362
Autor:
Ye, Qinghao, Xu, Haiyang, Xu, Guohai, Ye, Jiabo, Yan, Ming, Zhou, Yiyang, Wang, Junyang, Hu, Anwen, Shi, Pengcheng, Shi, Yaya, Li, Chenliang, Xu, Yuanhong, Chen, Hehong, Tian, Junfeng, Qian, Qi, Zhang, Ji, Huang, Fei, Zhou, Jingren
Large language models (LLMs) have demonstrated impressive zero-shot abilities on a variety of open-ended tasks, while recent research has also explored the use of LLMs for multi-modal generation. In this study, we introduce mPLUG-Owl, a novel trainin
Externí odkaz:
http://arxiv.org/abs/2304.14178