Zobrazeno 1 - 10
of 87
pro vyhledávání: '"Fan, Hehe"'
Zero-shot learning (ZSL) aims to recognize unseen classes by transferring semantic knowledge from seen classes to unseen ones, guided by semantic information. To this end, existing works have demonstrated remarkable performance by utilizing global vi
Externí odkaz:
http://arxiv.org/abs/2408.14868
In this paper, we briefly introduce the solution developed by our team, HFUT-VUT, for the track of Micro-gesture Classification in the MiGA challenge at IJCAI 2024. The task of micro-gesture classification task involves recognizing the category of a
Externí odkaz:
http://arxiv.org/abs/2408.03097
This paper presents Invariant Score Distillation (ISD), a novel method for high-fidelity text-to-3D generation. ISD aims to tackle the over-saturation and over-smoothing problems in Score Distillation Sampling (SDS). In this paper, SDS is decoupled i
Externí odkaz:
http://arxiv.org/abs/2407.09822
To bridge the gap between vision and language modalities, Multimodal Large Language Models (MLLMs) usually learn an adapter that converts visual inputs to understandable tokens for Large Language Models (LLMs). However, most adapters generate consist
Externí odkaz:
http://arxiv.org/abs/2405.15684
Recent advancements in image understanding have benefited from the extensive use of web image-text pairs. However, video understanding remains a challenge despite the availability of substantial web video-text data. This difficulty primarily arises f
Externí odkaz:
http://arxiv.org/abs/2405.13911
Uncovering What, Why and How: A Comprehensive Benchmark for Causation Understanding of Video Anomaly
Autor:
Du, Hang, Zhang, Sicheng, Xie, Binzhu, Nan, Guoshun, Zhang, Jiayang, Xu, Junrui, Liu, Hangyu, Leng, Sicong, Liu, Jiangming, Fan, Hehe, Huang, Dajiu, Feng, Jing, Chen, Linli, Zhang, Can, Li, Xuhuan, Zhang, Hao, Chen, Jianhang, Cui, Qimei, Tao, Xiaofeng
Video anomaly understanding (VAU) aims to automatically comprehend unusual occurrences in videos, thereby enabling various applications such as traffic surveillance and industrial manufacturing. While existing VAU benchmarks primarily concentrate on
Externí odkaz:
http://arxiv.org/abs/2405.00181
Protein representation learning is a challenging task that aims to capture the structure and function of proteins from their amino acid sequences. Previous methods largely ignored the fact that not all amino acids are equally important for protein fo
Externí odkaz:
http://arxiv.org/abs/2404.00254
Current diffusion-based video editing primarily focuses on local editing (\textit{e.g.,} object/background editing) or global style editing by utilizing various dense correspondences. However, these methods often fail to accurately edit the foregroun
Externí odkaz:
http://arxiv.org/abs/2403.16111
Protein research is crucial in various fundamental disciplines, but understanding their intricate structure-function relationships remains challenging. Recent Large Language Models (LLMs) have made significant strides in comprehending task-specific k
Externí odkaz:
http://arxiv.org/abs/2402.09649
Creating digital avatars from textual prompts has long been a desirable yet challenging task. Despite the promising outcomes obtained through 2D diffusion priors in recent works, current methods face challenges in achieving high-quality and animated
Externí odkaz:
http://arxiv.org/abs/2402.06149