Výsledky vyhledávání - "Guo, Qingpei"

Report

LoTLIP: Improving Language-Image Pre-training for Long Text Understanding

Autor: Wu, Wei, Zheng, Kecheng, Ma, Shuailei, Lu, Fan, Guo, Yuxin, Zhang, Yifei, Chen, Wei, Guo, Qingpei, Shen, Yujun, Zha, Zheng-Jun

Understanding long text is of great demands in practice but beyond the reach of most language-image pre-training (LIP) models. In this work, we empirically confirm that the key reason causing such an issue is that the training images are usually pair

Externí odkaz: http://arxiv.org/abs/2410.05249

Zobrazit plný text záznamu

Report

HOTVCOM: Generating Buzzworthy Comments for Videos

Autor: Chen, Yuyan, Qian, Yiwen, Yan, Songzhou, Jia, Jiyuan, Li, Zhixu, Xiao, Yanghua, Li, Xiaobo, Yang, Ming, Guo, Qingpei

In the era of social media video platforms, popular ``hot-comments'' play a crucial role in attracting user impressions of short-form videos, making them vital for marketing and branding purpose. However, existing research predominantly focuses on ge

Externí odkaz: http://arxiv.org/abs/2409.15196

Zobrazit plný text záznamu

Report

Social Debiasing for Fair Multi-modal LLMs

Autor: Cheng, Harry, Guo, Yangyang, Guo, Qingpei, Yang, Ming, Gan, Tian, Nie, Liqiang

Multi-modal Large Language Models (MLLMs) have advanced significantly, offering powerful vision-language understanding capabilities. However, these models often inherit severe social biases from their training datasets, leading to unfair predictions

Externí odkaz: http://arxiv.org/abs/2408.06569

Zobrazit plný text záznamu

Report

Hummer: Towards Limited Competitive Preference Dataset

Autor: Jiang, Li, Wu, Yusen, Xiong, Junwu, Ruan, Jingqing, Ding, Yichuan, Guo, Qingpei, Wen, Zujie, Zhou, Jun, Deng, Xiaotie

Publikováno v: COLM 2024

Preference datasets are essential for incorporating human preferences into pre-trained language models, playing a key role in the success of Reinforcement Learning from Human Feedback. However, these datasets often demonstrate conflicting alignment o

Externí odkaz: http://arxiv.org/abs/2405.11647

Zobrazit plný text záznamu

Report

SHE-Net: Syntax-Hierarchy-Enhanced Text-Video Retrieval

Autor: Yu, Xuzheng, Jiang, Chen, Dong, Xingning, Gan, Tian, Yang, Ming, Guo, Qingpei

The user base of short video apps has experienced unprecedented growth in recent years, resulting in a significant demand for video content analysis. In particular, text-video retrieval, which aims to find the top matching videos given text descripti

Externí odkaz: http://arxiv.org/abs/2404.14066

Zobrazit plný text záznamu

Report

M2-RAAP: A Multi-Modal Recipe for Advancing Adaptation-based Pre-training towards Effective and Efficient Zero-shot Video-text Retrieval

Autor: Dong, Xingning, Feng, Zipeng, Zhou, Chunluan, Yu, Xuzheng, Yang, Ming, Guo, Qingpei

We present a Multi-Modal Recipe for Advancing Adaptation-based Pre-training towards effective and efficient zero-shot video-text retrieval, dubbed M2-RAAP. Upon popular image-text models like CLIP, most current adaptation-based video-text pre-trainin

Externí odkaz: http://arxiv.org/abs/2401.17797

Zobrazit plný text záznamu

Report

SNP-S3: Shared Network Pre-training and Significant Semantic Strengthening for Various Video-Text Tasks

Autor: Dong, Xingning, Guo, Qingpei, Gan, Tian, Wang, Qing, Wu, Jianlong, Ren, Xiangyuan, Cheng, Yuan, Chu, Wei

We present a framework for learning cross-modal video representations by directly pre-training on raw data to facilitate various downstream video-text tasks. Our main contributions lie in the pre-training framework and proxy tasks. First, based on th

Externí odkaz: http://arxiv.org/abs/2401.17773

Zobrazit plný text záznamu

Report

M2-Encoder: Advancing Bilingual Image-Text Understanding by Large-scale Efficient Pretraining

Autor: Guo, Qingpei, Xu, Furong, Zhang, Hanxiao, Ren, Wang, Ma, Ziping, Ju, Lin, Wang, Jian, Chen, Jingdong, Yang, Ming

Vision-language foundation models like CLIP have revolutionized the field of artificial intelligence. Nevertheless, VLM models supporting multi-language, e.g., in both Chinese and English, have lagged due to the relative scarcity of large-scale pretr

Externí odkaz: http://arxiv.org/abs/2401.15896

Zobrazit plný text záznamu

Report

Knowledge-enhanced Multi-perspective Video Representation Learning for Scene Recognition

Autor: Yu, Xuzheng, Jiang, Chen, Zhang, Wei, Gan, Tian, Chao, Linlin, Zhao, Jianan, Cheng, Yuan, Guo, Qingpei, Chu, Wei

With the explosive growth of video data in real-world applications, a comprehensive representation of videos becomes increasingly important. In this paper, we address the problem of video scene recognition, whose goal is to learn a high-level video r

Externí odkaz: http://arxiv.org/abs/2401.04354

Zobrazit plný text záznamu

Report

SyCoCa: Symmetrizing Contrastive Captioners with Attentive Masking for Multimodal Alignment

Autor: Ma, Ziping, Xu, Furong, Liu, Jian, Yang, Ming, Guo, Qingpei

Multimodal alignment between language and vision is the fundamental topic in current vision-language model research. Contrastive Captioners (CoCa), as a representative method, integrates Contrastive Language-Image Pretraining (CLIP) and Image Caption

Externí odkaz: http://arxiv.org/abs/2401.02137

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání