Výsledky vyhledávání - "Liu, Yinsong"

Report

HRVDA: High-Resolution Visual Document Assistant

Autor: Liu, Chaohu, Yin, Kun, Cao, Haoyu, Jiang, Xinghua, Li, Xin, Liu, Yinsong, Jiang, Deqiang, Sun, Xing, Xu, Linli

Leveraging vast training data, multimodal large language models (MLLMs) have demonstrated formidable general visual comprehension capabilities and achieved remarkable performance across various tasks. However, their performance in visual document und

Externí odkaz: http://arxiv.org/abs/2404.06918

Zobrazit plný text záznamu

Report

Enhancing Visual Document Understanding with Contrastive Learning in Large Visual-Language Models

Autor: Li, Xin, Wu, Yunfei, Jiang, Xinghua, Guo, Zhihao, Gong, Mingming, Cao, Haoyu, Liu, Yinsong, Jiang, Deqiang, Sun, Xing

Recently, the advent of Large Visual-Language Models (LVLMs) has received increasing attention across various domains, particularly in the field of visual document understanding (VDU). Different from conventional vision-language tasks, VDU is specifi

Externí odkaz: http://arxiv.org/abs/2402.19014

Zobrazit plný text záznamu

Report

Attention Where It Matters: Rethinking Visual Document Understanding with Selective Region Concentration

Autor: Cao, Haoyu, Bao, Changcun, Liu, Chaohu, Chen, Huang, Yin, Kun, Liu, Hao, Liu, Yinsong, Jiang, Deqiang, Sun, Xing

We propose a novel end-to-end document understanding model called SeRum (SElective Region Understanding Model) for extracting meaningful information from document images, including document analysis, retrieval, and office automation. Unlike state-of-

Externí odkaz: http://arxiv.org/abs/2309.01131

Zobrazit plný text záznamu

Report

Grab What You Need: Rethinking Complex Table Structure Recognition with Flexible Components Deliberation

Autor: Liu, Hao, Li, Xin, Gong, Mingming, Liu, Bing, Wu, Yunfei, Jiang, Deqiang, Liu, Yinsong, Sun, Xing

Recently, Table Structure Recognition (TSR) task, aiming at identifying table structure into machine readable formats, has received increasing interest in the community. While impressive success, most single table component-based methods can not perf

Externí odkaz: http://arxiv.org/abs/2303.09174

Zobrazit plný text záznamu

Report

GMN: Generative Multi-modal Network for Practical Document Information Extraction

Autor: Cao, Haoyu, Ma, Jiefeng, Guo, Antai, Hu, Yiqing, Liu, Hao, Jiang, Deqiang, Liu, Yinsong, Ren, Bo

Document Information Extraction (DIE) has attracted increasing attention due to its various advanced applications in the real world. Although recent literature has already achieved competitive results, these approaches usually fail when dealing with

Externí odkaz: http://arxiv.org/abs/2207.04713

Zobrazit plný text záznamu

Report

Relational Representation Learning in Visually-Rich Documents

Autor: Li, Xin, Zheng, Yan, Hu, Yiqing, Cao, Haoyu, Wu, Yunfei, Jiang, Deqiang, Liu, Yinsong, Ren, Bo

Relational understanding is critical for a number of visually-rich documents (VRDs) understanding tasks. Through multi-modal pre-training, recent studies provide comprehensive contextual representations and exploit them as prior knowledge for downstr

Externí odkaz: http://arxiv.org/abs/2205.02411

Zobrazit plný text záznamu

Report

Neural Collaborative Graph Machines for Table Structure Recognition

Autor: Liu, Hao, Li, Xin, Liu, Bing, Jiang, Deqiang, Liu, Yinsong, Ren, Bo

Recently, table structure recognition has achieved impressive progress with the help of deep graph models. Most of them exploit single visual cues of tabular elements or simply combine visual cues with other modalities via early fusion to reason thei

Externí odkaz: http://arxiv.org/abs/2111.13359

Zobrazit plný text záznamu