Výsledky vyhledávání - "Meng, Xiaojun"

Report

Learning to Compress Contexts for Efficient Knowledge-based Visual Question Answering

Autor: Weng, Weixi, Zhu, Jieming, Zhang, Hao, Meng, Xiaojun, Zhang, Rui, Yuan, Chun

Multimodal Large Language Models (MLLMs) have demonstrated great zero-shot performance on visual question answering (VQA). However, when it comes to knowledge-based VQA (KB-VQA), MLLMs may lack human commonsense or specialized domain knowledge to ans

Externí odkaz: http://arxiv.org/abs/2409.07331

Zobrazit plný text záznamu

Report

End-to-End Video Question Answering with Frame Scoring Mechanisms and Adaptive Sampling

Autor: Liang, Jianxin, Meng, Xiaojun, Wang, Yueqian, Liu, Chang, Liu, Qun, Zhao, Dongyan

Video Question Answering (VideoQA) has emerged as a challenging frontier in the field of multimedia processing, requiring intricate interactions between visual and textual modalities. Simply uniformly sampling frames or indiscriminately aggregating f

Externí odkaz: http://arxiv.org/abs/2407.15047

Zobrazit plný text záznamu

Report

Prompt-Based Length Controlled Generation with Multiple Control Types

Autor: Jie, Renlong, Meng, Xiaojun, Shang, Lifeng, Jiang, Xin, Liu, Qun

Large language models (LLMs) have attracted great attention given their strong performance on a wide range of NLP tasks. In practice, users often expect generated texts to fall within a specific length range, making length controlled generation an im

Externí odkaz: http://arxiv.org/abs/2406.10278

Zobrazit plný text záznamu

Report

HawkEye: Training Video-Text LLMs for Grounding Text in Videos

Autor: Wang, Yueqian, Meng, Xiaojun, Liang, Jianxin, Wang, Yuxuan, Liu, Qun, Zhao, Dongyan

Video-text Large Language Models (video-text LLMs) have shown remarkable performance in answering questions and holding conversations on simple videos. However, they perform almost the same as random on grounding text queries in long and complicated

Externí odkaz: http://arxiv.org/abs/2403.10228

Zobrazit plný text záznamu

Report

Unsupervised Extractive Summarization with Learnable Length Control Strategies

Autor: Jie, Renlong, Meng, Xiaojun, Jiang, Xin, Liu, Qun

Unsupervised extractive summarization is an important technique in information extraction and retrieval. Compared with supervised method, it does not require high-quality human-labelled summaries for training and thus can be easily applied for docume

Externí odkaz: http://arxiv.org/abs/2312.06901

Zobrazit plný text záznamu

Report

Prompt-Based Length Controlled Generation with Reinforcement Learning

Autor: Jie, Renlong, Meng, Xiaojun, Shang, Lifeng, Jiang, Xin, Liu, Qun

Large language models (LLMs) like ChatGPT and GPT-4 have attracted great attention given their surprising performance on a wide range of NLP tasks. Length controlled generation of LLMs emerges as an important topic, which enables users to fully lever

Externí odkaz: http://arxiv.org/abs/2308.12030

Zobrazit plný text záznamu

Report

Enhancing Coherence of Extractive Summarization with Multitask Learning

Autor: Jie, Renlong, Meng, Xiaojun, Shang, Lifeng, Jiang, Xin, Liu, Qun

This study proposes a multitask learning architecture for extractive summarization with coherence boosting. The architecture contains an extractive summarizer and coherent discriminator module. The coherent discriminator is trained online on the sent

Externí odkaz: http://arxiv.org/abs/2305.12851

Zobrazit plný text záznamu

Report

Learning Summary-Worthy Visual Representation for Abstractive Summarization in Video

Autor: Xu, Zenan, Meng, Xiaojun, Wang, Yasheng, Su, Qinliang, Qiu, Zexuan, Jiang, Xin, Liu, Qun

Multimodal abstractive summarization for videos (MAS) requires generating a concise textual summary to describe the highlights of a video according to multimodal resources, in our case, the video content and its transcript. Inspired by the success of

Externí odkaz: http://arxiv.org/abs/2305.04824

Zobrazit plný text záznamu

Report

Wukong-Reader: Multi-modal Pre-training for Fine-grained Visual Document Understanding

Autor: Bai, Haoli, Liu, Zhiguang, Meng, Xiaojun, Li, Wentao, Liu, Shuang, Xie, Nian, Zheng, Rongfu, Wang, Liangwei, Hou, Lu, Wei, Jiansheng, Jiang, Xin, Liu, Qun

Unsupervised pre-training on millions of digital-born or scanned documents has shown promising advances in visual document understanding~(VDU). While various vision-language pre-training objectives are studied in existing solutions, the document text

Externí odkaz: http://arxiv.org/abs/2212.09621

Zobrazit plný text záznamu

Report

Lexicon-injected Semantic Parsing for Task-Oriented Dialog

Autor: Meng, Xiaojun, Dai, Wenlin, Wang, Yasheng, Wang, Baojun, Wu, Zhiyong, Jiang, Xin, Liu, Qun

Recently, semantic parsing using hierarchical representations for dialog systems has captured substantial attention. Task-Oriented Parse (TOP), a tree representation with intents and slots as labels of nested tree nodes, has been proposed for parsing

Externí odkaz: http://arxiv.org/abs/2211.14508

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání