Zobrazeno 1 - 10
of 1 271
pro vyhledávání: '"Huang, QingMing"'
Diffusion models (DMs) have demonstrated exceptional performance in text-to-image (T2I) tasks, leading to their widespread use. With the introduction of classifier-free guidance (CFG), the quality of images generated by DMs is improved. However, DMs
Externí odkaz:
http://arxiv.org/abs/2412.16039
Real-world datasets often exhibit a long-tailed distribution, where vast majority of classes known as tail classes have only few samples. Traditional methods tend to overfit on these tail classes. Recently, a new approach called Imbalanced SAM (ImbSA
Externí odkaz:
http://arxiv.org/abs/2412.13715
Video has emerged as a favored multimedia format on the internet. To better gain video contents, a new topic HIREST is presented, including video retrieval, moment retrieval, moment segmentation, and step-captioning. The pioneering work chooses the p
Externí odkaz:
http://arxiv.org/abs/2412.13543
This paper addresses the challenge of Granularity Competition in fine-grained classification tasks, which arises due to the semantic gap between multi-granularity labels. Existing approaches typically develop independent hierarchy-aware models based
Externí odkaz:
http://arxiv.org/abs/2412.12782
Autor:
Cong, Gaoxiang, Pan, Jiadong, Li, Liang, Qi, Yuankai, Peng, Yuxin, Hengel, Anton van den, Yang, Jian, Huang, Qingming
Given a piece of text, a video clip, and a reference audio, the movie dubbing task aims to generate speech that aligns with the video while cloning the desired voice. The existing methods have two primary deficiencies: (1) They struggle to simultaneo
Externí odkaz:
http://arxiv.org/abs/2412.08988
Autor:
Zhang, Jinzhong, Cai, Xinhang, Huang, Qingming, Chen, Yihong, Yang, Zhangqiang, Sun, Zhiyuan, Yang, Ye
Pairing states are essential for understanding the underlying mechanisms of high-temperature superconductivity. Here the non-superconducting state in an optimally doped YBa2Cu3O7-{\delta} film was driven out of equilibrium by an optical pump with low
Externí odkaz:
http://arxiv.org/abs/2412.06569
Parameter-efficient fine-tuning (PEFT) is an effective method for adapting pre-trained vision models to downstream tasks by tuning a small subset of parameters. Among PEFT methods, sparse tuning achieves superior performance by only adjusting the wei
Externí odkaz:
http://arxiv.org/abs/2411.01800
Message passing plays a vital role in graph neural networks (GNNs) for effective feature learning. However, the over-reliance on input topology diminishes the efficacy of message passing and restricts the ability of GNNs. Despite efforts to mitigate
Externí odkaz:
http://arxiv.org/abs/2410.23686
Publikováno v:
Transactions on Image Processing, vol. 33, pp. 3115-3129, 2024
Long-term Video Question Answering (VideoQA) is a challenging vision-and-language bridging task focusing on semantic understanding of untrimmed long-term videos and diverse free-form questions, simultaneously emphasizing comprehensive cross-modal rea
Externí odkaz:
http://arxiv.org/abs/2410.09379
Publikováno v:
IEEE Transactions on Circuits and Systems for Video Technology, 2024
Video Question Answering (VideoQA) represents a crucial intersection between video understanding and language processing, requiring both discriminative unimodal comprehension and sophisticated cross-modal interaction for accurate inference. Despite a
Externí odkaz:
http://arxiv.org/abs/2410.09380