Zobrazeno 1 - 10
of 213
pro vyhledávání: '"Tang, Jingqun"'
Autor:
Lu, Jinghui, Yu, Haiyang, Wang, Yanjie, Ye, Yongjie, Tang, Jingqun, Yang, Ziwei, Wu, Binghong, Liu, Qi, Feng, Hao, Wang, Han, Liu, Hao, Huang, Can
Recently, many studies have demonstrated that exclusively incorporating OCR-derived text and spatial layouts with large language models (LLMs) can be highly effective for document understanding tasks. However, existing methods that integrate spatial
Externí odkaz:
http://arxiv.org/abs/2407.01976
Autor:
Zhao, Weichao, Feng, Hao, Liu, Qi, Tang, Jingqun, Wei, Shu, Wu, Binghong, Liao, Lei, Ye, Yongjie, Liu, Hao, Li, Houqiang, Huang, Can
Tables contain factual and quantitative data accompanied by various structures and contents that pose challenges for machine comprehension. Previous methods generally design task-specific architectures and objectives for individual tasks, resulting i
Externí odkaz:
http://arxiv.org/abs/2406.01326
Autor:
Tang, Jingqun, Liu, Qi, Ye, Yongjie, Lu, Jinghui, Wei, Shu, Lin, Chunhui, Li, Wanqing, Mahmood, Mohamad Fitri Faiz Bin, Feng, Hao, Zhao, Zhen, Wang, Yanjie, Liu, Yuliang, Liu, Hao, Bai, Xiang, Huang, Can
Text-Centric Visual Question Answering (TEC-VQA) in its proper format not only facilitates human-machine interaction in text-centric visual environments but also serves as a de facto gold proxy to evaluate AI models in the domain of text-centric scen
Externí odkaz:
http://arxiv.org/abs/2405.11985
Autor:
Tang, Jingqun, Lin, Chunhui, Zhao, Zhen, Wei, Shu, Wu, Binghong, Liu, Qi, Feng, Hao, Li, Yang, Wang, Siqi, Liao, Lei, Shi, Wei, Liu, Yuliang, Liu, Hao, Xie, Yuan, Bai, Xiang, Huang, Can
Text-centric visual question answering (VQA) has made great strides with the development of Multimodal Large Language Models (MLLMs), yet open-source models still fall short of leading models like GPT4V and Gemini, partly due to a lack of extensive,
Externí odkaz:
http://arxiv.org/abs/2404.12803
Autor:
Zhao, Zhen, Tang, Jingqun, Lin, Chunhui, Wu, Binghong, Huang, Can, Liu, Hao, Tan, Xin, Zhang, Zhizhong, Xie, Yuan
Scene text recognition (STR) in the wild frequently encounters challenges when coping with domain variations, font diversity, shape deformations, etc. A straightforward solution is performing model fine-tuning tailored to a specific scenario, but it
Externí odkaz:
http://arxiv.org/abs/2311.13120
In the era of Large Language Models (LLMs), tremendous strides have been made in the field of multimodal understanding. However, existing advanced algorithms are limited to effectively utilizing the immense representation capabilities and rich world
Externí odkaz:
http://arxiv.org/abs/2308.11592
Autor:
Liu, Yuliang, Zhang, Jiaxin, Peng, Dezhi, Huang, Mingxin, Wang, Xinyu, Tang, Jingqun, Huang, Can, Lin, Dahua, Shen, Chunhua, Bai, Xiang, Jin, Lianwen
End-to-end scene text spotting has made significant progress due to its intrinsic synergy between text detection and recognition. Previous methods commonly regard manual annotations such as horizontal rectangles, rotated rectangles, quadrangles, and
Externí odkaz:
http://arxiv.org/abs/2301.01635
Text detection and recognition are essential components of a modern OCR system. Most OCR approaches attempt to obtain accurate bounding boxes of text at the detection stage, which is used as the input of the text recognition stage. We observe that wh
Externí odkaz:
http://arxiv.org/abs/2207.11934
Autor:
Tang, Jingqun, Zhang, Wenqing, Liu, Hongye, Yang, MingKun, Jiang, Bo, Hu, Guanglong, Bai, Xiang
Recently, transformer-based methods have achieved promising progresses in object detection, as they can eliminate the post-processes like NMS and enrich the deep representations. However, these methods cannot well cope with scene text due to its extr
Externí odkaz:
http://arxiv.org/abs/2203.15221
Autor:
Xie, Yaohuan, Zhang, Liyang, Wang, Lujuan, Chen, Bo, Guo, Xiaoting, Yang, Yanyi, Shi, Wenhua, Chen, Anqi, Yi, Junqi, Tang, Jingqun, Xiang, Juanjuan
Publikováno v:
In Cancer Letters 1 February 2024 582