Výsledky vyhledávání

Report

LayoutLLM: Layout Instruction Tuning with Large Language Models for Document Understanding

Autor: Luo, Chuwei, Shen, Yufan, Zhu, Zhaoqing, Zheng, Qi, Yu, Zhi, Yao, Cong

Recently, leveraging large language models (LLMs) or multimodal large language models (MLLMs) for document understanding has been proven very promising. However, previous works that employ LLMs/MLLMs for document understanding have not fully explored

Externí odkaz: http://arxiv.org/abs/2404.05225

Zobrazit plný text záznamu

Report

OmniParser: A Unified Framework for Text Spotting, Key Information Extraction and Table Recognition

Autor: Wan, Jianqiang, Song, Sibo, Yu, Wenwen, Liu, Yuliang, Cheng, Wenqing, Huang, Fei, Bai, Xiang, Yao, Cong, Yang, Zhibo

Recently, visually-situated text parsing (VsTP) has experienced notable advancements, driven by the increasing demand for automated document understanding and the emergence of Generative Large Language Models (LLMs) capable of processing document-bas

Externí odkaz: http://arxiv.org/abs/2403.19128

Zobrazit plný text záznamu

Report

HierCode: A Lightweight Hierarchical Codebook for Zero-shot Chinese Text Recognition

Autor: Zhang, Yuyi, Zhu, Yuanzhi, Peng, Dezhi, Zhang, Peirong, Yang, Zhenhua, Yang, Zhibo, Yao, Cong, Jin, Lianwen

Text recognition, especially for complex scripts like Chinese, faces unique challenges due to its intricate character structures and vast vocabulary. Traditional one-hot encoding methods struggle with the representation of hierarchical radicals, reco

Externí odkaz: http://arxiv.org/abs/2403.13761

Zobrazit plný text záznamu

Report

LORE++: Logical Location Regression Network for Table Structure Recognition with Pre-training

Autor: Long, Rujiao, Xing, Hangdi, Yang, Zhibo, Zheng, Qi, Yu, Zhi, Yao, Cong, Huang, Fei

Table structure recognition (TSR) aims at extracting tables in images into machine-understandable formats. Recent methods solve this problem by predicting the adjacency relations of detected cell boxes or learning to directly generate the correspondi

Externí odkaz: http://arxiv.org/abs/2401.01522

Zobrazit plný text záznamu

Report

FontDiffuser: One-Shot Font Generation via Denoising Diffusion with Multi-Scale Content Aggregation and Style Contrastive Learning

Autor: Yang, Zhenhua, Peng, Dezhi, Kong, Yuxin, Zhang, Yuyi, Yao, Cong, Jin, Lianwen

Publikováno v: 38th AAAI Conference on Artificial Intelligence (AAAI2024), Vancouver, BC, Canada, 2024

Automatic font generation is an imitation task, which aims to create a font library that mimics the style of reference images while preserving the content from source images. Although existing font generation methods have achieved satisfactory perfor

Externí odkaz: http://arxiv.org/abs/2312.12142

Zobrazit plný text záznamu

Report

DocXChain: A Powerful Open-Source Toolchain for Document Parsing and Beyond

Autor: Yao, Cong

In this report, we introduce DocXChain, a powerful open-source toolchain for document parsing, which is designed and developed to automatically convert the rich information embodied in unstructured documents, such as text, tables and charts, into str

Externí odkaz: http://arxiv.org/abs/2310.12430

Zobrazit plný text záznamu

Report

Vision Grid Transformer for Document Layout Analysis

Autor: Da, Cheng, Luo, Chuwei, Zheng, Qi, Yao, Cong

Document pre-trained models and grid-based models have proven to be very effective on various tasks in Document AI. However, for the document layout analysis (DLA) task, existing document pre-trained models, even those pre-trained in a multi-modal fa

Externí odkaz: http://arxiv.org/abs/2308.14978

Zobrazit plný text záznamu

Report

LISTER: Neighbor Decoding for Length-Insensitive Scene Text Recognition

Autor: Cheng, Changxu, Wang, Peng, Da, Cheng, Zheng, Qi, Yao, Cong

The diversity in length constitutes a significant characteristic of text. Due to the long-tail distribution of text lengths, most existing methods for scene text recognition (STR) only work well on short or seen-length text, lacking the capability of

Externí odkaz: http://arxiv.org/abs/2308.12774

Zobrazit plný text záznamu

Report

Multi-Granularity Prediction with Learnable Fusion for Scene Text Recognition

Autor: Da, Cheng, Wang, Peng, Yao, Cong

Due to the enormous technical challenges and wide range of applications, scene text recognition (STR) has been an active research topic in computer vision for years. To tackle this tough problem, numerous innovative methods have been successively pro

Externí odkaz: http://arxiv.org/abs/2307.13244

Zobrazit plný text záznamu

Report

Conditional Text Image Generation with Diffusion Models

Autor: Zhu, Yuanzhi, Li, Zhaohai, Wang, Tianwei, He, Mengchao, Yao, Cong

Current text recognition systems, including those for handwritten scripts and scene text, have relied heavily on image synthesis and augmentation, since it is difficult to realize real-world complexity and diversity through collecting and annotating

Externí odkaz: http://arxiv.org/abs/2306.10804

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání