Výsledky vyhledávání

Report

Autor: Lv, Tengchao, Huang, Yupan, Chen, Jingye, Zhao, Yuzhong, Jia, Yilin, Cui, Lei, Ma, Shuming, Chang, Yaoyao, Huang, Shaohan, Wang, Wenhui, Dong, Li, Luo, Weiyao, Wu, Shaoxiang, Wang, Guoxin, Zhang, Cha, Wei, Furu

The automatic reading of text-intensive images represents a significant advancement toward achieving Artificial General Intelligence (AGI). In this paper we present KOSMOS-2.5, a multimodal literate model for machine reading of text-intensive images.

Externí odkaz: http://arxiv.org/abs/2309.11419

Zobrazit plný text záznamu

Report

From Characters to Words: Hierarchical Pre-trained Language Model for Open-vocabulary Language Understanding

Autor: Sun, Li, Luisier, Florian, Batmanghelich, Kayhan, Florencio, Dinei, Zhang, Cha

Current state-of-the-art models for natural language understanding require a preprocessing step to convert raw text into discrete tokens. This process known as tokenization relies on a pre-built vocabulary of words or sub-word morphemes. This fixed v

Externí odkaz: http://arxiv.org/abs/2305.14571

Zobrazit plný text záznamu

Report

Diffusion-based Document Layout Generation

Autor: He, Liu, Lu, Yijuan, Corring, John, Florencio, Dinei, Zhang, Cha

Publikováno v: Document Analysis and Recognition - ICDAR 2023. ICDAR 2023. Lecture Notes in Computer Science, vol 14187. Springer, Cham

We develop a diffusion-based approach for various document layout sequence generation. Layout sequences specify the contents of a document design in an explicit format. Our novel diffusion-based approach works in the sequence domain rather than the i

Externí odkaz: http://arxiv.org/abs/2303.10787

Zobrazit plný text záznamu

Report

Unifying Vision, Text, and Layout for Universal Document Processing

Autor: Tang, Zineng, Yang, Ziyi, Wang, Guoxin, Fang, Yuwei, Liu, Yang, Zhu, Chenguang, Zeng, Michael, Zhang, Cha, Bansal, Mohit

We propose Universal Document Processing (UDOP), a foundation Document AI model which unifies text, image, and layout modalities together with varied task formats, including document understanding and generation. UDOP leverages the spatial correlatio

Externí odkaz: http://arxiv.org/abs/2212.02623

Zobrazit plný text záznamu

Elektronická kniha

Boosting-based face detection and adaptation [electronic resource] / Cha Zhang and Zhengyou Zhang.

Autor: Zhang, Cha

Externí odkaz: Kolekce e-knih KNAV Registrovani uzivatele: plny text online 5 minut, dalsi pristup na vyzadani. Registered users: full text online 5 minutes, further access on request.

Report

XDoc: Unified Pre-training for Cross-Format Document Understanding

Autor: Chen, Jingye, Lv, Tengchao, Cui, Lei, Zhang, Cha, Wei, Furu

The surge of pre-training has witnessed the rapid development of document understanding recently. Pre-training and fine-tuning framework has been effectively used to tackle texts in various formats, including plain texts, document texts, and web text

Externí odkaz: http://arxiv.org/abs/2210.02849

Zobrazit plný text záznamu

Report

Understanding Long Documents with Different Position-Aware Attentions

Autor: Pham, Hai, Wang, Guoxin, Lu, Yijuan, Florencio, Dinei, Zhang, Cha

Despite several successes in document understanding, the practical task for long document understanding is largely under-explored due to several challenges in computation and how to efficiently absorb long multimodal input. Most current transformer-b

Externí odkaz: http://arxiv.org/abs/2208.08201

Zobrazit plný text záznamu

Elektronická kniha

Light field sampling [electronic resource] / Cha Zhang, Tsuhan Chen.

Autor: Zhang, Cha

Externí odkaz: Kolekce e-knih KNAV Registrovani uzivatele: plny text online 5 minut, dalsi pristup na vyzadani. Registered users: full text online 5 minutes, further access on request.

Report

DiT: Self-supervised Pre-training for Document Image Transformer

Autor: Li, Junlong, Xu, Yiheng, Lv, Tengchao, Cui, Lei, Zhang, Cha, Wei, Furu

Image Transformer has recently achieved significant progress for natural image understanding, either using supervised (ViT, DeiT, etc.) or self-supervised (BEiT, MAE, etc.) pre-training techniques. In this paper, we propose \textbf{DiT}, a self-super

Externí odkaz: http://arxiv.org/abs/2203.02378

Zobrazit plný text záznamu

Report

Improving Structured Text Recognition with Regular Expression Biasing

Autor: Shi, Baoguang, Cheng, Wenfeng, Lu, Yijuan, Zhang, Cha, Florencio, Dinei

We study the problem of recognizing structured text, i.e. text that follows certain formats, and propose to improve the recognition accuracy of structured text by specifying regular expressions (regexes) for biasing. A biased recognizer recognizes te

Externí odkaz: http://arxiv.org/abs/2111.06738

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání