Zobrazeno 1 - 10
of 258
pro vyhledávání: '"Zhang, Meishan"'
Autor:
Zhao, Yu, Fei, Hao, Li, Xiangtai, Qin, Libo, Ji, Jiayi, Zhu, Hongyuan, Zhang, Meishan, Zhang, Min, Wei, Jianguo
In the visual spatial understanding (VSU) area, spatial image-to-text (SI2T) and spatial text-to-image (ST2I) are two fundamental tasks that appear in dual form. Existing methods for standalone SI2T or ST2I perform imperfectly in spatial understandin
Externí odkaz:
http://arxiv.org/abs/2410.15312
Grammar Induction could benefit from rich heterogeneous signals, such as text, vision, and acoustics. In the process, features from distinct modalities essentially serve complementary roles to each other. With such intuition, this work introduces a n
Externí odkaz:
http://arxiv.org/abs/2410.03739
Event extraction (EE) is a critical direction in the field of information extraction, laying an important foundation for the construction of structured knowledge bases. EE from text has received ample research and attention for years, yet there can b
Externí odkaz:
http://arxiv.org/abs/2408.09462
Autor:
Guo, Peiming, Liu, Sinuo, Zhang, Yanzhao, Long, Dingkun, Xie, Pengjun, Zhang, Meishan, Zhang, Min
Photo-Sharing Multi-modal dialogue generation requires a dialogue agent not only to generate text responses but also to share photos at the proper moment. Using image text caption as the bridge, a pipeline model integrates an image caption model, a t
Externí odkaz:
http://arxiv.org/abs/2408.08650
Autor:
Zhang, Xin, Zhang, Yanzhao, Long, Dingkun, Xie, Wen, Dai, Ziqi, Tang, Jialong, Lin, Huan, Yang, Baosong, Xie, Pengjun, Huang, Fei, Zhang, Meishan, Li, Wenjie, Zhang, Min
We present systematic efforts in building long-context multilingual text representation model (TRM) and reranker from scratch for text retrieval. We first introduce a text encoder (base size) enhanced with RoPE and unpadding, pre-trained in a native
Externí odkaz:
http://arxiv.org/abs/2407.19669
Publikováno v:
[J].IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024
While pre-training large-scale video-language models (VLMs) has shown remarkable potential for various downstream video-language tasks, existing VLMs can still suffer from certain commonly seen limitations, e.g., coarse-grained cross-modal aligning ,
Externí odkaz:
http://arxiv.org/abs/2406.19255
Publikováno v:
Proceedings of Interspeech 2024
Opinion Expression Identification (OEI) is essential in NLP for applications ranging from voice assistants to depression diagnosis. This study extends OEI to encompass multimodal inputs, underlining the significance of auditory cues in delivering emo
Externí odkaz:
http://arxiv.org/abs/2406.18088
Autor:
Chen, Huiyao, Zhao, Yu, Chen, Zulong, Wang, Mengjia, Li, Liangyue, Zhang, Meishan, Zhang, Min
Hierarchical text classification (HTC) is an important task with broad applications, while few-shot HTC has gained increasing interest recently. While in-context learning (ICL) with large language models (LLMs) has achieved significant success in few
Externí odkaz:
http://arxiv.org/abs/2406.17534
Autor:
Wang, Yidong, Guo, Qi, Yao, Wenjin, Zhang, Hongbo, Zhang, Xin, Wu, Zhen, Zhang, Meishan, Dai, Xinyu, Zhang, Min, Wen, Qingsong, Ye, Wei, Zhang, Shikun, Zhang, Yue
This paper introduces AutoSurvey, a speedy and well-organized methodology for automating the creation of comprehensive literature surveys in rapidly evolving fields like artificial intelligence. Traditional survey paper creation faces challenges due
Externí odkaz:
http://arxiv.org/abs/2406.10252
In the field of information extraction (IE), tasks across a wide range of modalities and their combinations have been traditionally studied in isolation, leaving a gap in deeply recognizing and analyzing cross-modal information. To address this, this
Externí odkaz:
http://arxiv.org/abs/2406.03701