Zobrazeno 1 - 10
of 1 432
pro vyhledávání: '"GAN, TIAN"'
Knowledge distillation is a mainstream algorithm in model compression by transferring knowledge from the larger model (teacher) to the smaller model (student) to improve the performance of student. Despite many efforts, existing methods mainly invest
Externí odkaz:
http://arxiv.org/abs/2410.14143
In recent years, mobile phone data has been widely used for human mobility analytics. Identifying individual activity locations is the fundamental step for mobile phone data processing. Current methods typically aggregate spatially adjacent location
Externí odkaz:
http://arxiv.org/abs/2410.13912
Multi-modal Large Language Models (MLLMs) have advanced significantly, offering powerful vision-language understanding capabilities. However, these models often inherit severe social biases from their training datasets, leading to unfair predictions
Externí odkaz:
http://arxiv.org/abs/2408.06569
The user base of short video apps has experienced unprecedented growth in recent years, resulting in a significant demand for video content analysis. In particular, text-video retrieval, which aims to find the top matching videos given text descripti
Externí odkaz:
http://arxiv.org/abs/2404.14066
Autor:
Dong, Xingning, Guo, Qingpei, Gan, Tian, Wang, Qing, Wu, Jianlong, Ren, Xiangyuan, Cheng, Yuan, Chu, Wei
We present a framework for learning cross-modal video representations by directly pre-training on raw data to facilitate various downstream video-text tasks. Our main contributions lie in the pre-training framework and proxy tasks. First, based on th
Externí odkaz:
http://arxiv.org/abs/2401.17773
Autor:
Yu, Xuzheng, Jiang, Chen, Zhang, Wei, Gan, Tian, Chao, Linlin, Zhao, Jianan, Cheng, Yuan, Guo, Qingpei, Chu, Wei
With the explosive growth of video data in real-world applications, a comprehensive representation of videos becomes increasingly important. In this paper, we address the problem of video scene recognition, whose goal is to learn a high-level video r
Externí odkaz:
http://arxiv.org/abs/2401.04354
Publikováno v:
In International Conference on Multimedia. ACM, 557--566 (2023)
Recent advancements in video-language understanding have been established on the foundation of image-text models, resulting in promising outcomes due to the shared knowledge between images and videos. However, video-language understanding presents un
Externí odkaz:
http://arxiv.org/abs/2312.00347
Motivated by the superior performance of image diffusion models, more and more researchers strive to extend these models to the text-based video editing task. Nevertheless, current video editing tasks mainly suffer from the dilemma between the high f
Externí odkaz:
http://arxiv.org/abs/2308.10648
This paper aims to tackle a novel task - Temporal Sentence Grounding in Streaming Videos (TSGSV). The goal of TSGSV is to evaluate the relevance between a video stream and a given sentence query. Unlike regular videos, streaming videos are acquired c
Externí odkaz:
http://arxiv.org/abs/2308.07102
The last decade has witnessed the proliferation of micro-videos on various user-generated content platforms. According to our statistics, around 85.7\% of micro-videos lack annotation. In this paper, we focus on annotating micro-videos with tags. Exi
Externí odkaz:
http://arxiv.org/abs/2303.08318