Výsledky vyhledávání

Report

A Bounding Box is Worth One Token: Interleaving Layout and Text in a Large Language Model for Document Understanding

Autor: Lu, Jinghui, Yu, Haiyang, Wang, Yanjie, Ye, Yongjie, Tang, Jingqun, Yang, Ziwei, Wu, Binghong, Liu, Qi, Feng, Hao, Wang, Han, Liu, Hao, Huang, Can

Recently, many studies have demonstrated that exclusively incorporating OCR-derived text and spatial layouts with large language models (LLMs) can be highly effective for document understanding tasks. However, existing methods that integrate spatial

Externí odkaz: http://arxiv.org/abs/2407.01976

Zobrazit plný text záznamu

Report

TabPedia: Towards Comprehensive Visual Table Understanding with Concept Synergy

Autor: Zhao, Weichao, Feng, Hao, Liu, Qi, Tang, Jingqun, Wei, Shu, Wu, Binghong, Liao, Lei, Ye, Yongjie, Liu, Hao, Li, Houqiang, Huang, Can

Tables contain factual and quantitative data accompanied by various structures and contents that pose challenges for machine comprehension. Previous methods generally design task-specific architectures and objectives for individual tasks, resulting i

Externí odkaz: http://arxiv.org/abs/2406.01326

Zobrazit plný text záznamu

Report

MTVQA: Benchmarking Multilingual Text-Centric Visual Question Answering

Autor: Tang, Jingqun, Liu, Qi, Ye, Yongjie, Lu, Jinghui, Wei, Shu, Lin, Chunhui, Li, Wanqing, Mahmood, Mohamad Fitri Faiz Bin, Feng, Hao, Zhao, Zhen, Wang, Yanjie, Liu, Yuliang, Liu, Hao, Bai, Xiang, Huang, Can

Text-Centric Visual Question Answering (TEC-VQA) in its proper format not only facilitates human-machine interaction in text-centric visual environments but also serves as a de facto gold proxy to evaluate AI models in the domain of text-centric scen

Externí odkaz: http://arxiv.org/abs/2405.11985

Zobrazit plný text záznamu

Report

TextSquare: Scaling up Text-Centric Visual Instruction Tuning

Autor: Tang, Jingqun, Lin, Chunhui, Zhao, Zhen, Wei, Shu, Wu, Binghong, Liu, Qi, Feng, Hao, Li, Yang, Wang, Siqi, Liao, Lei, Shi, Wei, Liu, Yuliang, Liu, Hao, Xie, Yuan, Bai, Xiang, Huang, Can

Text-centric visual question answering (VQA) has made great strides with the development of Multimodal Large Language Models (MLLMs), yet open-source models still fall short of leading models like GPT4V and Gemini, partly due to a lack of extensive,

Externí odkaz: http://arxiv.org/abs/2404.12803

Zobrazit plný text záznamu

Report

PURPLE: Making a Large Language Model a Better SQL Writer

Autor: Ren, Tonghui, Fan, Yuankai, He, Zhenying, Huang, Ren, Dai, Jiaqi, Huang, Can, Jing, Yinan, Zhang, Kai, Yang, Yifan, Wang, X. Sean

Large Language Model (LLM) techniques play an increasingly important role in Natural Language to SQL (NL2SQL) translation. LLMs trained by extensive corpora have strong natural language understanding and basic SQL generation abilities without additio

Externí odkaz: http://arxiv.org/abs/2403.20014

Zobrazit plný text záznamu

Report

Elysium: Exploring Object-level Perception in Videos via MLLM

Autor: Wang, Han, Wang, Yanjie, Ye, Yongjie, Nie, Yuxiang, Huang, Can

Multi-modal Large Language Models (MLLMs) have demonstrated their ability to perceive objects in still images, but their application in video-related tasks, such as object tracking, remains understudied. This lack of exploration is primarily due to t

Externí odkaz: http://arxiv.org/abs/2403.16558

Zobrazit plný text záznamu

Report

Metasql: A Generate-then-Rank Framework for Natural Language to SQL Translation

Autor: Fan, Yuankai, He, Zhenying, Ren, Tonghui, Huang, Can, Jing, Yinan, Zhang, Kai, Wang, X. Sean

The Natural Language Interface to Databases (NLIDB) empowers non-technical users with database access through intuitive natural language (NL) interactions. Advanced approaches, utilizing neural sequence-to-sequence models or large-scale language mode

Externí odkaz: http://arxiv.org/abs/2402.17144

Zobrazit plný text záznamu

Report

PaDeLLM-NER: Parallel Decoding in Large Language Models for Named Entity Recognition

Autor: Lu, Jinghui, Yang, Ziwei, Wang, Yanjie, Liu, Xuejing, Mac Namee, Brian, Huang, Can

In this study, we aim to reduce generation latency for Named Entity Recognition (NER) with Large Language Models (LLMs). The main cause of high latency in LLMs is the sequential decoding process, which autoregressively generates all labels and mentio

Externí odkaz: http://arxiv.org/abs/2402.04838

Zobrazit plný text záznamu

Report

Controllable distant interactions at bound state in the continuum

Autor: Tang, Haijun, Huang, Can, Wang, Yuhan, Jiang, Xiong, Xiao, Shumin, Han, Jiecai, Song, Qinghai

Distant interactions at arbitrary locations and their dynamic control are fundamentally important for realizing large-scale photonic and quantum circuits. Conventional approaches suffer from short coupling distance, poor controllability, fixed locati

Externí odkaz: http://arxiv.org/abs/2401.08177

Zobrazit plný text záznamu

Report

GloTSFormer: Global Video Text Spotting Transformer

Autor: Wang, Han, Wang, Yanjie, Li, Yang, Huang, Can

Video Text Spotting (VTS) is a fundamental visual task that aims to predict the trajectories and content of texts in a video. Previous works usually conduct local associations and apply IoU-based distance and complex post-processing procedures to boo

Externí odkaz: http://arxiv.org/abs/2401.03694

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání