Zobrazeno 1 - 10
of 257
pro vyhledávání: '"Zhang, Cha"'
Autor:
Lv, Tengchao, Huang, Yupan, Chen, Jingye, Zhao, Yuzhong, Jia, Yilin, Cui, Lei, Ma, Shuming, Chang, Yaoyao, Huang, Shaohan, Wang, Wenhui, Dong, Li, Luo, Weiyao, Wu, Shaoxiang, Wang, Guoxin, Zhang, Cha, Wei, Furu
The automatic reading of text-intensive images represents a significant advancement toward achieving Artificial General Intelligence (AGI). In this paper we present KOSMOS-2.5, a multimodal literate model for machine reading of text-intensive images.
Externí odkaz:
http://arxiv.org/abs/2309.11419
Current state-of-the-art models for natural language understanding require a preprocessing step to convert raw text into discrete tokens. This process known as tokenization relies on a pre-built vocabulary of words or sub-word morphemes. This fixed v
Externí odkaz:
http://arxiv.org/abs/2305.14571
Publikováno v:
Document Analysis and Recognition - ICDAR 2023. ICDAR 2023. Lecture Notes in Computer Science, vol 14187. Springer, Cham
We develop a diffusion-based approach for various document layout sequence generation. Layout sequences specify the contents of a document design in an explicit format. Our novel diffusion-based approach works in the sequence domain rather than the i
Externí odkaz:
http://arxiv.org/abs/2303.10787
Autor:
Tang, Zineng, Yang, Ziyi, Wang, Guoxin, Fang, Yuwei, Liu, Yang, Zhu, Chenguang, Zeng, Michael, Zhang, Cha, Bansal, Mohit
We propose Universal Document Processing (UDOP), a foundation Document AI model which unifies text, image, and layout modalities together with varied task formats, including document understanding and generation. UDOP leverages the spatial correlatio
Externí odkaz:
http://arxiv.org/abs/2212.02623
The surge of pre-training has witnessed the rapid development of document understanding recently. Pre-training and fine-tuning framework has been effectively used to tackle texts in various formats, including plain texts, document texts, and web text
Externí odkaz:
http://arxiv.org/abs/2210.02849
Despite several successes in document understanding, the practical task for long document understanding is largely under-explored due to several challenges in computation and how to efficiently absorb long multimodal input. Most current transformer-b
Externí odkaz:
http://arxiv.org/abs/2208.08201
Image Transformer has recently achieved significant progress for natural image understanding, either using supervised (ViT, DeiT, etc.) or self-supervised (BEiT, MAE, etc.) pre-training techniques. In this paper, we propose \textbf{DiT}, a self-super
Externí odkaz:
http://arxiv.org/abs/2203.02378
We study the problem of recognizing structured text, i.e. text that follows certain formats, and propose to improve the recognition accuracy of structured text by specifying regular expressions (regexes) for biasing. A biased recognizer recognizes te
Externí odkaz:
http://arxiv.org/abs/2111.06738