Zobrazeno 1 - 10
of 1 620
pro vyhledávání: '"Yang, Xun"'
Autor:
Zhou, Sheng, Xiao, Junbin, Yang, Xun, Song, Peipei, Guo, Dan, Yao, Angela, Wang, Meng, Chua, Tat-Seng
Existing efforts in text-based video question answering (TextVideoQA) are criticized for their opaque decisionmaking and heavy reliance on scene-text recognition. In this paper, we propose to study Grounded TextVideoQA by forcing models to answer que
Externí odkaz:
http://arxiv.org/abs/2409.14319
Domain generalization (DG) task aims to learn a robust model from source domains that could handle the out-of-distribution (OOD) issue. In order to improve the generalization ability of the model in unseen domains, increasing the diversity of trainin
Externí odkaz:
http://arxiv.org/abs/2409.04699
Autor:
Yin, Xiangchen, Di, Donglin, Fan, Lei, Li, Hao, Wei, Chen, Gou, Xiaofei, Song, Yang, Sun, Xiao, Yang, Xun
Recent methods using diffusion models have made significant progress in human image generation with various additional controls such as pose priors. However, existing approaches still struggle to generate high-quality images with consistent pose alig
Externí odkaz:
http://arxiv.org/abs/2408.16540
Recent advancements in image synthesis, particularly with the advent of GAN and Diffusion models, have amplified public concerns regarding the dissemination of disinformation. To address such concerns, numerous AI-generated Image (AIGI) Detectors hav
Externí odkaz:
http://arxiv.org/abs/2407.20836
Prompt learning represents a promising method for adapting pre-trained vision-language models (VLMs) to various downstream tasks by learning a set of text embeddings. One challenge inherent to these methods is the poor generalization performance due
Externí odkaz:
http://arxiv.org/abs/2407.19674
Full surround monodepth (FSM) methods can learn from multiple camera views simultaneously in a self-supervised manner to predict the scale-aware depth, which is more practical for real-world applications in contrast to scale-ambiguous depth from a st
Externí odkaz:
http://arxiv.org/abs/2407.10406
Skeletal motion plays a pivotal role in human activity recognition (HAR). Recently, attack methods have been proposed to identify the universal vulnerability of skeleton-based HAR(S-HAR). However, the research of adversarial transferability on S-HAR
Externí odkaz:
http://arxiv.org/abs/2407.08572
Despite significant strides in the field of 3D scene editing, current methods encounter substantial challenge, particularly in preserving 3D consistency in multi-view editing process. To tackle this challenge, we propose a progressive 3D editing stra
Externí odkaz:
http://arxiv.org/abs/2407.02034
Unsupervised domain adaptation (UDA) is a critical problem for transfer learning, which aims to transfer the semantic information from labeled source domain to unlabeled target domain. Recent advancements in UDA models have demonstrated significant g
Externí odkaz:
http://arxiv.org/abs/2405.17774
Knowledge tracing has been widely used in online learning systems to guide the students' future learning. However, most existing KT models primarily focus on extracting abundant information from the question sets and explore the relationships between
Externí odkaz:
http://arxiv.org/abs/2405.16799