Výsledky vyhledávání - "Tang, Guozhi"

Report

ParGo: Bridging Vision-Language with Partial and Global Views

Autor: Wang, An-Lan, Shan, Bin, Shi, Wei, Lin, Kun-Yu, Fei, Xiang, Tang, Guozhi, Liao, Lei, Tang, Jingqun, Huang, Can, Zheng, Wei-Shi

This work presents ParGo, a novel Partial-Global projector designed to connect the vision and language modalities for Multimodal Large Language Models (MLLMs). Unlike previous works that rely on global attention-based projectors, our ParGo bridges th

Externí odkaz: http://arxiv.org/abs/2408.12928

Zobrazit plný text záznamu

Report

Bi-VLDoc: Bidirectional Vision-Language Modeling for Visually-Rich Document Understanding

Autor: Luo, Chuwei, Tang, Guozhi, Zheng, Qi, Yao, Cong, Jin, Lianwen, Li, Chenliang, Xue, Yang, Si, Luo

Multi-modal document pre-trained models have proven to be very effective in a variety of visually-rich document understanding (VrDU) tasks. Though existing document pre-trained models have achieved excellent performance on standard benchmarks for VrD

Externí odkaz: http://arxiv.org/abs/2206.13155

Zobrazit plný text záznamu

Akademický článek

Localization and saturation of degradation space for weakly-supervised real-world super-resolution

Autor: Tang, Guozhi, Ge, Hongwei, Liu, Yuxuan, Wu, Chunguo

Publikováno v: In Knowledge-Based Systems 5 September 2024 299

Zobrazit plný text záznamu

Akademický článek

Progressive reconstruction-decoupled face super-resolution framework with controllable knowledge guidance

Autor: Tang, Guozhi, Ge, Hongwei, Gu, Enxuan, Hou, Yaqing, Zhao, Mingde

Publikováno v: In Knowledge-Based Systems 5 September 2024 299

Zobrazit plný text záznamu

Report

MatchVIE: Exploiting Match Relevancy between Entities for Visual Information Extraction

Autor: Tang, Guozhi, Xie, Lele, Jin, Lianwen, Wang, Jiapeng, Chen, Jingdong, Xu, Zhen, Wang, Qianying, Wu, Yaqiang, Li, Hui

Visual Information Extraction (VIE) task aims to extract key information from multifarious document images (e.g., invoices and purchase receipts). Most previous methods treat the VIE task simply as a sequence labeling problem or classification proble

Externí odkaz: http://arxiv.org/abs/2106.12940

Zobrazit plný text záznamu

Report

Tag, Copy or Predict: A Unified Weakly-Supervised Learning Framework for Visual Information Extraction using Sequences

Autor: Wang, Jiapeng, Wang, Tianwei, Tang, Guozhi, Jin, Lianwen, Ma, Weihong, Ding, Kai, Huang, Yichao

Visual information extraction (VIE) has attracted increasing attention in recent years. The existing methods usually first organized optical character recognition (OCR) results into plain texts and then utilized token-level entity annotations as supe

Externí odkaz: http://arxiv.org/abs/2106.10681

Zobrazit plný text záznamu

Report

Towards Robust Visual Information Extraction in Real World: New Dataset and Novel Solution

Autor: Wang, Jiapeng, Liu, Chongyu, Jin, Lianwen, Tang, Guozhi, Zhang, Jiaxin, Zhang, Shuaitao, Wang, Qianying, Wu, Yaqiang, Cai, Mingxiang

Visual information extraction (VIE) has attracted considerable attention recently owing to its various advanced applications such as document understanding, automatic marking and intelligent education. Most existing works decoupled this problem into

Externí odkaz: http://arxiv.org/abs/2102.06732

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání