Výsledky vyhledávání

Report

ZeroMamba: Exploring Visual State Space Model for Zero-Shot Learning

Autor: Hou, Wenjin, Fu, Dingjie, Li, Kun, Chen, Shiming, Fan, Hehe, Yang, Yi

Zero-shot learning (ZSL) aims to recognize unseen classes by transferring semantic knowledge from seen classes to unseen ones, guided by semantic information. To this end, existing works have demonstrated remarkable performance by utilizing global vi

Externí odkaz: http://arxiv.org/abs/2408.14868

Zobrazit plný text záznamu

Report

Prototype Learning for Micro-gesture Classification

Autor: Chen, Guoliang, Wang, Fei, Li, Kun, Wu, Zhiliang, Fan, Hehe, Yang, Yi, Wang, Meng, Guo, Dan

In this paper, we briefly introduce the solution developed by our team, HFUT-VUT, for the track of Micro-gesture Classification in the MiGA challenge at IJCAI 2024. The task of micro-gesture classification task involves recognizing the category of a

Externí odkaz: http://arxiv.org/abs/2408.03097

Zobrazit plný text záznamu

Report

VividDreamer: Invariant Score Distillation For Hyper-Realistic Text-to-3D Generation

Autor: Zhuo, Wenjie, Ma, Fan, Fan, Hehe, Yang, Yi

This paper presents Invariant Score Distillation (ISD), a novel method for high-fidelity text-to-3D generation. ISD aims to tackle the over-saturation and over-smoothing problems in Score Distillation Sampling (SDS). In this paper, SDS is decoupled i

Externí odkaz: http://arxiv.org/abs/2407.09822

Zobrazit plný text záznamu

Report

Prompt-Aware Adapter: Towards Learning Adaptive Visual Tokens for Multimodal Large Language Models

Autor: Zhang, Yue, Fan, Hehe, Yang, Yi

To bridge the gap between vision and language modalities, Multimodal Large Language Models (MLLMs) usually learn an adapter that converts visual inputs to understandable tokens for Large Language Models (LLMs). However, most adapters generate consist

Externí odkaz: http://arxiv.org/abs/2405.15684

Zobrazit plný text záznamu

Report

TOPA: Extend Large Language Models for Video Understanding via Text-Only Pre-Alignment

Autor: Li, Wei, Fan, Hehe, Wong, Yongkang, Kankanhalli, Mohan, Yang, Yi

Recent advancements in image understanding have benefited from the extensive use of web image-text pairs. However, video understanding remains a challenge despite the availability of substantial web video-text data. This difficulty primarily arises f

Externí odkaz: http://arxiv.org/abs/2405.13911

Zobrazit plný text záznamu

Report

Uncovering What, Why and How: A Comprehensive Benchmark for Causation Understanding of Video Anomaly

Autor: Du, Hang, Zhang, Sicheng, Xie, Binzhu, Nan, Guoshun, Zhang, Jiayang, Xu, Junrui, Liu, Hangyu, Leng, Sicong, Liu, Jiangming, Fan, Hehe, Huang, Dajiu, Feng, Jing, Chen, Linli, Zhang, Can, Li, Xuhuan, Zhang, Hao, Chen, Jianhang, Cui, Qimei, Tao, Xiaofeng

Video anomaly understanding (VAU) aims to automatically comprehend unusual occurrences in videos, thereby enabling various applications such as traffic surveillance and industrial manufacturing. While existing VAU benchmarks primarily concentrate on

Externí odkaz: http://arxiv.org/abs/2405.00181

Zobrazit plný text záznamu

Report

Clustering for Protein Representation Learning

Autor: Quan, Ruijie, Wang, Wenguan, Ma, Fan, Fan, Hehe, Yang, Yi

Protein representation learning is a challenging task that aims to capture the structure and function of proteins from their amino acid sequences. Previous methods largely ignored the fact that not all amino acids are equally important for protein fo

Externí odkaz: http://arxiv.org/abs/2404.00254

Zobrazit plný text záznamu

Report

EVA: Zero-shot Accurate Attributes and Multi-Object Video Editing

Autor: Yang, Xiangpeng, Zhu, Linchao, Fan, Hehe, Yang, Yi

Current diffusion-based video editing primarily focuses on local editing (\textit{e.g.,} object/background editing) or global style editing by utilizing various dense correspondences. However, these methods often fail to accurately edit the foregroun

Externí odkaz: http://arxiv.org/abs/2403.16111

Zobrazit plný text záznamu

Report

ProtChatGPT: Towards Understanding Proteins with Large Language Models

Autor: Wang, Chao, Fan, Hehe, Quan, Ruijie, Yang, Yi

Protein research is crucial in various fundamental disciplines, but understanding their intricate structure-function relationships remains challenging. Recent Large Language Models (LLMs) have made significant strides in comprehending task-specific k

Externí odkaz: http://arxiv.org/abs/2402.09649

Zobrazit plný text záznamu

Report

HeadStudio: Text to Animatable Head Avatars with 3D Gaussian Splatting

Autor: Zhou, Zhenglin, Ma, Fan, Fan, Hehe, Yang, Yi

Creating digital avatars from textual prompts has long been a desirable yet challenging task. Despite the promising outcomes obtained through 2D diffusion priors in recent works, current methods face challenges in achieving high-quality and animated

Externí odkaz: http://arxiv.org/abs/2402.06149

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání