Výsledky vyhledávání - "Chen, Jinyue"

Report

General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model

Autor: Wei, Haoran, Liu, Chenglong, Chen, Jinyue, Wang, Jia, Kong, Lingyu, Xu, Yanming, Ge, Zheng, Zhao, Liang, Sun, Jianjian, Peng, Yuang, Han, Chunrui, Zhang, Xiangyu

Traditional OCR systems (OCR-1.0) are increasingly unable to meet people's usage due to the growing demand for intelligent processing of man-made optical characters. In this paper, we collectively refer to all artificial optical signals (e.g., plain

Externí odkaz: http://arxiv.org/abs/2409.01704

Zobrazit plný text záznamu

Report

Focus Anywhere for Fine-grained Multi-page Document Understanding

Autor: Liu, Chenglong, Wei, Haoran, Chen, Jinyue, Kong, Lingyu, Ge, Zheng, Zhu, Zining, Zhao, Liang, Sun, Jianjian, Han, Chunrui, Zhang, Xiangyu

Modern LVLMs still struggle to achieve fine-grained document understanding, such as OCR/translation/caption for regions of interest to the user, tasks that require the context of the entire page, or even multiple pages. Accordingly, this paper propos

Externí odkaz: http://arxiv.org/abs/2405.14295

Zobrazit plný text záznamu

Report

OneChart: Purify the Chart Structural Extraction via One Auxiliary Token

Autor: Chen, Jinyue, Kong, Lingyu, Wei, Haoran, Liu, Chenglong, Ge, Zheng, Zhao, Liang, Sun, Jianjian, Han, Chunrui, Zhang, Xiangyu

Chart parsing poses a significant challenge due to the diversity of styles, values, texts, and so forth. Even advanced large vision-language models (LVLMs) with billions of parameters struggle to handle such tasks satisfactorily. To address this, we

Externí odkaz: http://arxiv.org/abs/2404.09987

Zobrazit plný text záznamu

Report

Small Language Model Meets with Reinforced Vision Vocabulary

Autor: Wei, Haoran, Kong, Lingyu, Chen, Jinyue, Zhao, Liang, Ge, Zheng, Yu, En, Sun, Jianjian, Han, Chunrui, Zhang, Xiangyu

Playing Large Vision Language Models (LVLMs) in 2023 is trendy among the AI community. However, the relatively large number of parameters (more than 7B) of popular LVLMs makes it difficult to train and deploy on consumer GPUs, discouraging many resea

Externí odkaz: http://arxiv.org/abs/2401.12503

Zobrazit plný text záznamu

Report

Vary: Scaling up the Vision Vocabulary for Large Vision-Language Models

Autor: Wei, Haoran, Kong, Lingyu, Chen, Jinyue, Zhao, Liang, Ge, Zheng, Yang, Jinrong, Sun, Jianjian, Han, Chunrui, Zhang, Xiangyu

Modern Large Vision-Language Models (LVLMs) enjoy the same vision vocabulary -- CLIP, which can cover most common vision tasks. However, for some special vision task that needs dense and fine-grained vision perception, e.g., document-level OCR or cha

Externí odkaz: http://arxiv.org/abs/2312.06109

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání