Zobrazeno 1 - 10
of 933
pro vyhledávání: '"Yao, Cong"'
Recently, leveraging large language models (LLMs) or multimodal large language models (MLLMs) for document understanding has been proven very promising. However, previous works that employ LLMs/MLLMs for document understanding have not fully explored
Externí odkaz:
http://arxiv.org/abs/2404.05225
Autor:
Wan, Jianqiang, Song, Sibo, Yu, Wenwen, Liu, Yuliang, Cheng, Wenqing, Huang, Fei, Bai, Xiang, Yao, Cong, Yang, Zhibo
Recently, visually-situated text parsing (VsTP) has experienced notable advancements, driven by the increasing demand for automated document understanding and the emergence of Generative Large Language Models (LLMs) capable of processing document-bas
Externí odkaz:
http://arxiv.org/abs/2403.19128
Autor:
Zhang, Yuyi, Zhu, Yuanzhi, Peng, Dezhi, Zhang, Peirong, Yang, Zhenhua, Yang, Zhibo, Yao, Cong, Jin, Lianwen
Text recognition, especially for complex scripts like Chinese, faces unique challenges due to its intricate character structures and vast vocabulary. Traditional one-hot encoding methods struggle with the representation of hierarchical radicals, reco
Externí odkaz:
http://arxiv.org/abs/2403.13761
Table structure recognition (TSR) aims at extracting tables in images into machine-understandable formats. Recent methods solve this problem by predicting the adjacency relations of detected cell boxes or learning to directly generate the correspondi
Externí odkaz:
http://arxiv.org/abs/2401.01522
Publikováno v:
38th AAAI Conference on Artificial Intelligence (AAAI2024), Vancouver, BC, Canada, 2024
Automatic font generation is an imitation task, which aims to create a font library that mimics the style of reference images while preserving the content from source images. Although existing font generation methods have achieved satisfactory perfor
Externí odkaz:
http://arxiv.org/abs/2312.12142
Autor:
Yao, Cong
In this report, we introduce DocXChain, a powerful open-source toolchain for document parsing, which is designed and developed to automatically convert the rich information embodied in unstructured documents, such as text, tables and charts, into str
Externí odkaz:
http://arxiv.org/abs/2310.12430
Document pre-trained models and grid-based models have proven to be very effective on various tasks in Document AI. However, for the document layout analysis (DLA) task, existing document pre-trained models, even those pre-trained in a multi-modal fa
Externí odkaz:
http://arxiv.org/abs/2308.14978
The diversity in length constitutes a significant characteristic of text. Due to the long-tail distribution of text lengths, most existing methods for scene text recognition (STR) only work well on short or seen-length text, lacking the capability of
Externí odkaz:
http://arxiv.org/abs/2308.12774
Due to the enormous technical challenges and wide range of applications, scene text recognition (STR) has been an active research topic in computer vision for years. To tackle this tough problem, numerous innovative methods have been successively pro
Externí odkaz:
http://arxiv.org/abs/2307.13244
Current text recognition systems, including those for handwritten scripts and scene text, have relied heavily on image synthesis and augmentation, since it is difficult to realize real-world complexity and diversity through collecting and annotating
Externí odkaz:
http://arxiv.org/abs/2306.10804