Zobrazeno 1 - 10
of 2 085
pro vyhledávání: '"Huang, Can"'
Autor:
Lu, Jinghui, Yu, Haiyang, Wang, Yanjie, Ye, Yongjie, Tang, Jingqun, Yang, Ziwei, Wu, Binghong, Liu, Qi, Feng, Hao, Wang, Han, Liu, Hao, Huang, Can
Recently, many studies have demonstrated that exclusively incorporating OCR-derived text and spatial layouts with large language models (LLMs) can be highly effective for document understanding tasks. However, existing methods that integrate spatial
Externí odkaz:
http://arxiv.org/abs/2407.01976
Autor:
Zhao, Weichao, Feng, Hao, Liu, Qi, Tang, Jingqun, Wei, Shu, Wu, Binghong, Liao, Lei, Ye, Yongjie, Liu, Hao, Li, Houqiang, Huang, Can
Tables contain factual and quantitative data accompanied by various structures and contents that pose challenges for machine comprehension. Previous methods generally design task-specific architectures and objectives for individual tasks, resulting i
Externí odkaz:
http://arxiv.org/abs/2406.01326
Autor:
Tang, Jingqun, Liu, Qi, Ye, Yongjie, Lu, Jinghui, Wei, Shu, Lin, Chunhui, Li, Wanqing, Mahmood, Mohamad Fitri Faiz Bin, Feng, Hao, Zhao, Zhen, Wang, Yanjie, Liu, Yuliang, Liu, Hao, Bai, Xiang, Huang, Can
Text-Centric Visual Question Answering (TEC-VQA) in its proper format not only facilitates human-machine interaction in text-centric visual environments but also serves as a de facto gold proxy to evaluate AI models in the domain of text-centric scen
Externí odkaz:
http://arxiv.org/abs/2405.11985
Autor:
Tang, Jingqun, Lin, Chunhui, Zhao, Zhen, Wei, Shu, Wu, Binghong, Liu, Qi, Feng, Hao, Li, Yang, Wang, Siqi, Liao, Lei, Shi, Wei, Liu, Yuliang, Liu, Hao, Xie, Yuan, Bai, Xiang, Huang, Can
Text-centric visual question answering (VQA) has made great strides with the development of Multimodal Large Language Models (MLLMs), yet open-source models still fall short of leading models like GPT4V and Gemini, partly due to a lack of extensive,
Externí odkaz:
http://arxiv.org/abs/2404.12803
Autor:
Ren, Tonghui, Fan, Yuankai, He, Zhenying, Huang, Ren, Dai, Jiaqi, Huang, Can, Jing, Yinan, Zhang, Kai, Yang, Yifan, Wang, X. Sean
Large Language Model (LLM) techniques play an increasingly important role in Natural Language to SQL (NL2SQL) translation. LLMs trained by extensive corpora have strong natural language understanding and basic SQL generation abilities without additio
Externí odkaz:
http://arxiv.org/abs/2403.20014
Multi-modal Large Language Models (MLLMs) have demonstrated their ability to perceive objects in still images, but their application in video-related tasks, such as object tracking, remains understudied. This lack of exploration is primarily due to t
Externí odkaz:
http://arxiv.org/abs/2403.16558
The Natural Language Interface to Databases (NLIDB) empowers non-technical users with database access through intuitive natural language (NL) interactions. Advanced approaches, utilizing neural sequence-to-sequence models or large-scale language mode
Externí odkaz:
http://arxiv.org/abs/2402.17144
In this study, we aim to reduce generation latency for Named Entity Recognition (NER) with Large Language Models (LLMs). The main cause of high latency in LLMs is the sequential decoding process, which autoregressively generates all labels and mentio
Externí odkaz:
http://arxiv.org/abs/2402.04838
Autor:
Tang, Haijun, Huang, Can, Wang, Yuhan, Jiang, Xiong, Xiao, Shumin, Han, Jiecai, Song, Qinghai
Distant interactions at arbitrary locations and their dynamic control are fundamentally important for realizing large-scale photonic and quantum circuits. Conventional approaches suffer from short coupling distance, poor controllability, fixed locati
Externí odkaz:
http://arxiv.org/abs/2401.08177
Video Text Spotting (VTS) is a fundamental visual task that aims to predict the trajectories and content of texts in a video. Previous works usually conduct local associations and apply IoU-based distance and complex post-processing procedures to boo
Externí odkaz:
http://arxiv.org/abs/2401.03694