Zobrazeno 1 - 10
of 23
pro vyhledávání: '"Guo, Qingpei"'
Autor:
Wu, Wei, Zheng, Kecheng, Ma, Shuailei, Lu, Fan, Guo, Yuxin, Zhang, Yifei, Chen, Wei, Guo, Qingpei, Shen, Yujun, Zha, Zheng-Jun
Understanding long text is of great demands in practice but beyond the reach of most language-image pre-training (LIP) models. In this work, we empirically confirm that the key reason causing such an issue is that the training images are usually pair
Externí odkaz:
http://arxiv.org/abs/2410.05249
Autor:
Chen, Yuyan, Qian, Yiwen, Yan, Songzhou, Jia, Jiyuan, Li, Zhixu, Xiao, Yanghua, Li, Xiaobo, Yang, Ming, Guo, Qingpei
In the era of social media video platforms, popular ``hot-comments'' play a crucial role in attracting user impressions of short-form videos, making them vital for marketing and branding purpose. However, existing research predominantly focuses on ge
Externí odkaz:
http://arxiv.org/abs/2409.15196
Multi-modal Large Language Models (MLLMs) have advanced significantly, offering powerful vision-language understanding capabilities. However, these models often inherit severe social biases from their training datasets, leading to unfair predictions
Externí odkaz:
http://arxiv.org/abs/2408.06569
Autor:
Jiang, Li, Wu, Yusen, Xiong, Junwu, Ruan, Jingqing, Ding, Yichuan, Guo, Qingpei, Wen, Zujie, Zhou, Jun, Deng, Xiaotie
Publikováno v:
COLM 2024
Preference datasets are essential for incorporating human preferences into pre-trained language models, playing a key role in the success of Reinforcement Learning from Human Feedback. However, these datasets often demonstrate conflicting alignment o
Externí odkaz:
http://arxiv.org/abs/2405.11647
The user base of short video apps has experienced unprecedented growth in recent years, resulting in a significant demand for video content analysis. In particular, text-video retrieval, which aims to find the top matching videos given text descripti
Externí odkaz:
http://arxiv.org/abs/2404.14066
We present a Multi-Modal Recipe for Advancing Adaptation-based Pre-training towards effective and efficient zero-shot video-text retrieval, dubbed M2-RAAP. Upon popular image-text models like CLIP, most current adaptation-based video-text pre-trainin
Externí odkaz:
http://arxiv.org/abs/2401.17797
Autor:
Dong, Xingning, Guo, Qingpei, Gan, Tian, Wang, Qing, Wu, Jianlong, Ren, Xiangyuan, Cheng, Yuan, Chu, Wei
We present a framework for learning cross-modal video representations by directly pre-training on raw data to facilitate various downstream video-text tasks. Our main contributions lie in the pre-training framework and proxy tasks. First, based on th
Externí odkaz:
http://arxiv.org/abs/2401.17773
Autor:
Guo, Qingpei, Xu, Furong, Zhang, Hanxiao, Ren, Wang, Ma, Ziping, Ju, Lin, Wang, Jian, Chen, Jingdong, Yang, Ming
Vision-language foundation models like CLIP have revolutionized the field of artificial intelligence. Nevertheless, VLM models supporting multi-language, e.g., in both Chinese and English, have lagged due to the relative scarcity of large-scale pretr
Externí odkaz:
http://arxiv.org/abs/2401.15896
Autor:
Yu, Xuzheng, Jiang, Chen, Zhang, Wei, Gan, Tian, Chao, Linlin, Zhao, Jianan, Cheng, Yuan, Guo, Qingpei, Chu, Wei
With the explosive growth of video data in real-world applications, a comprehensive representation of videos becomes increasingly important. In this paper, we address the problem of video scene recognition, whose goal is to learn a high-level video r
Externí odkaz:
http://arxiv.org/abs/2401.04354
Multimodal alignment between language and vision is the fundamental topic in current vision-language model research. Contrastive Captioners (CoCa), as a representative method, integrates Contrastive Language-Image Pretraining (CLIP) and Image Caption
Externí odkaz:
http://arxiv.org/abs/2401.02137