Zobrazeno 1 - 10
of 3 326
pro vyhledávání: '"Geng Xin"'
Source-Free Domain Generalization (SFDG) aims to develop a model that performs on unseen domains without relying on any source domains. However, the implementation remains constrained due to the unavailability of training data. Research on SFDG focus
Externí odkaz:
http://arxiv.org/abs/2501.01109
Text-to-motion generation is essential for advancing the creative industry but often presents challenges in producing consistent, realistic motions. To address this, we focus on fine-tuning text-to-motion models to consistently favor high-quality, hu
Externí odkaz:
http://arxiv.org/abs/2412.05095
Autor:
Zhang, Miaosen, Dai, Qi, Yang, Yifan, Bao, Jianmin, Chen, Dongdong, Qiu, Kai, Luo, Chong, Geng, Xin, Guo, Baining
LMMs have shown impressive visual understanding capabilities, with the potential to be applied in agents, which demand strong reasoning and planning abilities. Nevertheless, existing benchmarks mostly assess their reasoning abilities in language part
Externí odkaz:
http://arxiv.org/abs/2412.04531
Video anomaly detection is an essential yet challenging open-set task in computer vision, often addressed by leveraging reconstruction as a proxy task. However, existing reconstruction-based methods encounter challenges in two main aspects: (1) limit
Externí odkaz:
http://arxiv.org/abs/2412.03044
``Creative'' remains an inherently abstract concept for both humans and diffusion models. While text-to-image (T2I) diffusion models can easily generate out-of-domain concepts like ``a blue banana'', they struggle with generating combinatorial object
Externí odkaz:
http://arxiv.org/abs/2410.24160
Instance-dependent Partial Label Learning (ID-PLL) aims to learn a multi-class predictive model given training instances annotated with candidate labels related to features, among which correct labels are hidden fixed but unknown. The previous works
Externí odkaz:
http://arxiv.org/abs/2410.20797
Label Distribution Learning (LDL) is a novel machine learning paradigm that addresses the problem of label ambiguity and has found widespread applications. Obtaining complete label distributions in real-world scenarios is challenging, which has led t
Externí odkaz:
http://arxiv.org/abs/2410.13579
Recent advancements in pre-trained vision-language models like CLIP have shown promise in person re-identification (ReID) applications. However, their performance in generalizable person re-identification tasks remains suboptimal. The large-scale and
Externí odkaz:
http://arxiv.org/abs/2410.11255
Large language models have achieved remarkable capabilities, but aligning their outputs with human values and preferences remains a significant challenge. Existing alignment methods primarily focus on positive examples while overlooking the importanc
Externí odkaz:
http://arxiv.org/abs/2410.12194
Diffusion models often face slow convergence, and existing efficient training techniques, such as Parameter-Efficient Fine-Tuning (PEFT), are primarily designed for fine-tuning pre-trained models. However, these methods are limited in adapting models
Externí odkaz:
http://arxiv.org/abs/2409.19289