Výsledky vyhledávání - "Zhang, Meishan"

Report

Synergistic Dual Spatial-aware Generation of Image-to-Text and Text-to-Image

Autor: Zhao, Yu, Fei, Hao, Li, Xiangtai, Qin, Libo, Ji, Jiayi, Zhu, Hongyuan, Zhang, Meishan, Zhang, Min, Wei, Jianguo

In the visual spatial understanding (VSU) area, spatial image-to-text (SI2T) and spatial text-to-image (ST2I) are two fundamental tasks that appear in dual form. Existing methods for standalone SI2T or ST2I perform imperfectly in spatial understandin

Externí odkaz: http://arxiv.org/abs/2410.15312

Zobrazit plný text záznamu

Report

Grammar Induction from Visual, Speech and Text

Autor: Zhao, Yu, Fei, Hao, Wu, Shengqiong, Zhang, Meishan, Zhang, Min, Chua, Tat-seng

Grammar Induction could benefit from rich heterogeneous signals, such as text, vision, and acoustics. In the process, features from distinct modalities essentially serve complementary roles to each other. With such intuition, this work introduces a n

Externí odkaz: http://arxiv.org/abs/2410.03739

Zobrazit plný text záznamu

Report

SpeechEE: A Novel Benchmark for Speech Event Extraction

Autor: Wang, Bin, Zhang, Meishan, Fei, Hao, Zhao, Yu, Li, Bobo, Wu, Shengqiong, Ji, Wei, Zhang, Min

Event extraction (EE) is a critical direction in the field of information extraction, laying an important foundation for the construction of structured knowledge bases. EE from text has received ample research and attention for years, yet there can b

Externí odkaz: http://arxiv.org/abs/2408.09462

Zobrazit plný text záznamu

Report

An End-to-End Model for Photo-Sharing Multi-modal Dialogue Generation

Autor: Guo, Peiming, Liu, Sinuo, Zhang, Yanzhao, Long, Dingkun, Xie, Pengjun, Zhang, Meishan, Zhang, Min

Photo-Sharing Multi-modal dialogue generation requires a dialogue agent not only to generate text responses but also to share photos at the proper moment. Using image text caption as the bridge, a pipeline model integrates an image caption model, a t

Externí odkaz: http://arxiv.org/abs/2408.08650

Zobrazit plný text záznamu

Report

mGTE: Generalized Long-Context Text Representation and Reranking Models for Multilingual Text Retrieval

Autor: Zhang, Xin, Zhang, Yanzhao, Long, Dingkun, Xie, Wen, Dai, Ziqi, Tang, Jialong, Lin, Huan, Yang, Baosong, Xie, Pengjun, Huang, Fei, Zhang, Meishan, Li, Wenjie, Zhang, Min

We present systematic efforts in building long-context multilingual text representation model (TRM) and reranker from scratch for text retrieval. We first introduce a text encoder (base size) enhanced with RoPE and unpadding, pre-trained in a native

Externí odkaz: http://arxiv.org/abs/2407.19669

Zobrazit plný text záznamu

Report

Enhancing Video-Language Representations with Structural Spatio-Temporal Alignment

Autor: Fei, Hao, Wu, Shengqiong, Zhang, Meishan, Zhang, Min, Chua, Tat-Seng, Yan, Shuicheng

Publikováno v: [J].IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024

While pre-training large-scale video-language models (VLMs) has shown remarkable potential for various downstream video-language tasks, existing VLMs can still suffer from certain commonly seen limitations, e.g., coarse-grained cross-modal aligning ,

Externí odkaz: http://arxiv.org/abs/2406.19255

Zobrazit plný text záznamu

Report

LLM-Driven Multimodal Opinion Expression Identification

Autor: Jia, Bonian, Chen, Huiyao, Sun, Yueheng, Zhang, Meishan, Zhang, Min

Publikováno v: Proceedings of Interspeech 2024

Opinion Expression Identification (OEI) is essential in NLP for applications ranging from voice assistants to depression diagnosis. This study extends OEI to encompass multimodal inputs, underlining the significance of auditory cues in delivering emo

Externí odkaz: http://arxiv.org/abs/2406.18088

Zobrazit plný text záznamu

Report

Retrieval-style In-Context Learning for Few-shot Hierarchical Text Classification

Autor: Chen, Huiyao, Zhao, Yu, Chen, Zulong, Wang, Mengjia, Li, Liangyue, Zhang, Meishan, Zhang, Min

Hierarchical text classification (HTC) is an important task with broad applications, while few-shot HTC has gained increasing interest recently. While in-context learning (ICL) with large language models (LLMs) has achieved significant success in few

Externí odkaz: http://arxiv.org/abs/2406.17534

Zobrazit plný text záznamu

Report

AutoSurvey: Large Language Models Can Automatically Write Surveys

Autor: Wang, Yidong, Guo, Qi, Yao, Wenjin, Zhang, Hongbo, Zhang, Xin, Wu, Zhen, Zhang, Meishan, Dai, Xinyu, Zhang, Min, Wen, Qingsong, Ye, Wei, Zhang, Shikun, Zhang, Yue

This paper introduces AutoSurvey, a speedy and well-organized methodology for automating the creation of comprehensive literature surveys in rapidly evolving fields like artificial intelligence. Traditional survey paper creation faces challenges due

Externí odkaz: http://arxiv.org/abs/2406.10252

Zobrazit plný text záznamu

Report

Recognizing Everything from All Modalities at Once: Grounded Multimodal Universal Information Extraction

Autor: Zhang, Meishan, Fei, Hao, Wang, Bin, Wu, Shengqiong, Cao, Yixin, Li, Fei, Zhang, Min

In the field of information extraction (IE), tasks across a wide range of modalities and their combinations have been traditionally studied in isolation, leaving a gap in deeply recognizing and analyzing cross-modal information. To address this, this

Externí odkaz: http://arxiv.org/abs/2406.03701

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání