Zobrazeno 1 - 10
of 63
pro vyhledávání: '"Liu, Yinsong"'
Autor:
Liu, Chaohu, Yin, Kun, Cao, Haoyu, Jiang, Xinghua, Li, Xin, Liu, Yinsong, Jiang, Deqiang, Sun, Xing, Xu, Linli
Leveraging vast training data, multimodal large language models (MLLMs) have demonstrated formidable general visual comprehension capabilities and achieved remarkable performance across various tasks. However, their performance in visual document und
Externí odkaz:
http://arxiv.org/abs/2404.06918
Autor:
Li, Xin, Wu, Yunfei, Jiang, Xinghua, Guo, Zhihao, Gong, Mingming, Cao, Haoyu, Liu, Yinsong, Jiang, Deqiang, Sun, Xing
Recently, the advent of Large Visual-Language Models (LVLMs) has received increasing attention across various domains, particularly in the field of visual document understanding (VDU). Different from conventional vision-language tasks, VDU is specifi
Externí odkaz:
http://arxiv.org/abs/2402.19014
Autor:
Cao, Haoyu, Bao, Changcun, Liu, Chaohu, Chen, Huang, Yin, Kun, Liu, Hao, Liu, Yinsong, Jiang, Deqiang, Sun, Xing
We propose a novel end-to-end document understanding model called SeRum (SElective Region Understanding Model) for extracting meaningful information from document images, including document analysis, retrieval, and office automation. Unlike state-of-
Externí odkaz:
http://arxiv.org/abs/2309.01131
Autor:
Liu, Hao, Li, Xin, Gong, Mingming, Liu, Bing, Wu, Yunfei, Jiang, Deqiang, Liu, Yinsong, Sun, Xing
Recently, Table Structure Recognition (TSR) task, aiming at identifying table structure into machine readable formats, has received increasing interest in the community. While impressive success, most single table component-based methods can not perf
Externí odkaz:
http://arxiv.org/abs/2303.09174
Autor:
Cao, Haoyu, Ma, Jiefeng, Guo, Antai, Hu, Yiqing, Liu, Hao, Jiang, Deqiang, Liu, Yinsong, Ren, Bo
Document Information Extraction (DIE) has attracted increasing attention due to its various advanced applications in the real world. Although recent literature has already achieved competitive results, these approaches usually fail when dealing with
Externí odkaz:
http://arxiv.org/abs/2207.04713
Autor:
Li, Xin, Zheng, Yan, Hu, Yiqing, Cao, Haoyu, Wu, Yunfei, Jiang, Deqiang, Liu, Yinsong, Ren, Bo
Relational understanding is critical for a number of visually-rich documents (VRDs) understanding tasks. Through multi-modal pre-training, recent studies provide comprehensive contextual representations and exploit them as prior knowledge for downstr
Externí odkaz:
http://arxiv.org/abs/2205.02411
Recently, table structure recognition has achieved impressive progress with the help of deep graph models. Most of them exploit single visual cues of tabular elements or simply combine visual cues with other modalities via early fusion to reason thei
Externí odkaz:
http://arxiv.org/abs/2111.13359
Publikováno v:
In Wear 15 December 2023 534-535
Akademický článek
Tento výsledek nelze pro nepřihlášené uživatele zobrazit.
K zobrazení výsledku je třeba se přihlásit.
K zobrazení výsledku je třeba se přihlásit.
Akademický článek
Tento výsledek nelze pro nepřihlášené uživatele zobrazit.
K zobrazení výsledku je třeba se přihlásit.
K zobrazení výsledku je třeba se přihlásit.