Výsledky vyhledávání - "Tang, Jingqun"

Report

A Bounding Box is Worth One Token: Interleaving Layout and Text in a Large Language Model for Document Understanding

Autor: Lu, Jinghui, Yu, Haiyang, Wang, Yanjie, Ye, Yongjie, Tang, Jingqun, Yang, Ziwei, Wu, Binghong, Liu, Qi, Feng, Hao, Wang, Han, Liu, Hao, Huang, Can

Recently, many studies have demonstrated that exclusively incorporating OCR-derived text and spatial layouts with large language models (LLMs) can be highly effective for document understanding tasks. However, existing methods that integrate spatial

Externí odkaz: http://arxiv.org/abs/2407.01976

Zobrazit plný text záznamu

Report

TabPedia: Towards Comprehensive Visual Table Understanding with Concept Synergy

Autor: Zhao, Weichao, Feng, Hao, Liu, Qi, Tang, Jingqun, Wei, Shu, Wu, Binghong, Liao, Lei, Ye, Yongjie, Liu, Hao, Li, Houqiang, Huang, Can

Tables contain factual and quantitative data accompanied by various structures and contents that pose challenges for machine comprehension. Previous methods generally design task-specific architectures and objectives for individual tasks, resulting i

Externí odkaz: http://arxiv.org/abs/2406.01326

Zobrazit plný text záznamu

Report

MTVQA: Benchmarking Multilingual Text-Centric Visual Question Answering

Autor: Tang, Jingqun, Liu, Qi, Ye, Yongjie, Lu, Jinghui, Wei, Shu, Lin, Chunhui, Li, Wanqing, Mahmood, Mohamad Fitri Faiz Bin, Feng, Hao, Zhao, Zhen, Wang, Yanjie, Liu, Yuliang, Liu, Hao, Bai, Xiang, Huang, Can

Text-Centric Visual Question Answering (TEC-VQA) in its proper format not only facilitates human-machine interaction in text-centric visual environments but also serves as a de facto gold proxy to evaluate AI models in the domain of text-centric scen

Externí odkaz: http://arxiv.org/abs/2405.11985

Zobrazit plný text záznamu

Report

TextSquare: Scaling up Text-Centric Visual Instruction Tuning

Autor: Tang, Jingqun, Lin, Chunhui, Zhao, Zhen, Wei, Shu, Wu, Binghong, Liu, Qi, Feng, Hao, Li, Yang, Wang, Siqi, Liao, Lei, Shi, Wei, Liu, Yuliang, Liu, Hao, Xie, Yuan, Bai, Xiang, Huang, Can

Text-centric visual question answering (VQA) has made great strides with the development of Multimodal Large Language Models (MLLMs), yet open-source models still fall short of leading models like GPT4V and Gemini, partly due to a lack of extensive,

Externí odkaz: http://arxiv.org/abs/2404.12803

Zobrazit plný text záznamu

Report

Multi-modal In-Context Learning Makes an Ego-evolving Scene Text Recognizer

Autor: Zhao, Zhen, Tang, Jingqun, Lin, Chunhui, Wu, Binghong, Huang, Can, Liu, Hao, Tan, Xin, Zhang, Zhizhong, Xie, Yuan

Scene text recognition (STR) in the wild frequently encounters challenges when coping with domain variations, font diversity, shape deformations, etc. A straightforward solution is performing model fine-tuning tailored to a specific scenario, but it

Externí odkaz: http://arxiv.org/abs/2311.13120

Zobrazit plný text záznamu

Report

UniDoc: A Universal Large Multimodal Model for Simultaneous Text Detection, Recognition, Spotting and Understanding

Autor: Feng, Hao, Wang, Zijian, Tang, Jingqun, Lu, Jinghui, Zhou, Wengang, Li, Houqiang, Huang, Can

In the era of Large Language Models (LLMs), tremendous strides have been made in the field of multimodal understanding. However, existing advanced algorithms are limited to effectively utilizing the immense representation capabilities and rich world

Externí odkaz: http://arxiv.org/abs/2308.11592

Zobrazit plný text záznamu

Report

SPTS v2: Single-Point Scene Text Spotting

Autor: Liu, Yuliang, Zhang, Jiaxin, Peng, Dezhi, Huang, Mingxin, Wang, Xinyu, Tang, Jingqun, Huang, Can, Lin, Dahua, Shen, Chunhua, Bai, Xiang, Jin, Lianwen

End-to-end scene text spotting has made significant progress due to its intrinsic synergy between text detection and recognition. Previous methods commonly regard manual annotations such as horizontal rectangles, rotated rectangles, quadrangles, and

Externí odkaz: http://arxiv.org/abs/2301.01635

Zobrazit plný text záznamu

Report

Optimal Boxes: Boosting End-to-End Scene Text Recognition by Adjusting Annotated Bounding Boxes via Reinforcement Learning

Autor: Tang, Jingqun, Qian, Wenming, Song, Luchuan, Dong, Xiena, Li, Lan, Bai, Xiang

Text detection and recognition are essential components of a modern OCR system. Most OCR approaches attempt to obtain accurate bounding boxes of text at the detection stage, which is used as the input of the text recognition stage. We observe that wh

Externí odkaz: http://arxiv.org/abs/2207.11934

Zobrazit plný text záznamu

Report

Few Could Be Better Than All: Feature Sampling and Grouping for Scene Text Detection

Autor: Tang, Jingqun, Zhang, Wenqing, Liu, Hongye, Yang, MingKun, Jiang, Bo, Hu, Guanglong, Bai, Xiang

Recently, transformer-based methods have achieved promising progresses in object detection, as they can eliminate the post-processes like NMS and enrich the deep representations. However, these methods cannot well cope with scene text due to its extr

Externí odkaz: http://arxiv.org/abs/2203.15221

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání