Výsledky vyhledávání

Report

DreamDance: Animating Human Images by Enriching 3D Geometry Cues from 2D Poses

Autor: Pang, Yatian, Zhu, Bin, Lin, Bin, Zheng, Mingzhe, Tay, Francis E. H., Lim, Ser-Nam, Yang, Harry, Yuan, Li

In this work, we present DreamDance, a novel method for animating human images using only skeleton pose sequences as conditional inputs. Existing approaches struggle with generating coherent, high-quality content in an efficient and user-friendly man

Externí odkaz: http://arxiv.org/abs/2412.00397

Zobrazit plný text záznamu

Report

Open-Sora Plan: Open-Source Large Video Generation Model

We introduce Open-Sora Plan, an open-source project that aims to contribute a large generation model for generating desired high-resolution videos with long durations based on various user inputs. Our project comprises multiple components for the ent

Externí odkaz: http://arxiv.org/abs/2412.00131

Zobrazit plný text záznamu

Report

WF-VAE: Enhancing Video VAE by Wavelet-Driven Energy Flow for Latent Video Diffusion Model

Autor: Li, Zongjian, Lin, Bin, Ye, Yang, Chen, Liuhan, Cheng, Xinhua, Yuan, Shenghai, Yuan, Li

Video Variational Autoencoder (VAE) encodes videos into a low-dimensional latent space, becoming a key component of most Latent Video Diffusion Models (LVDMs) to reduce model training costs. However, as the resolution and duration of generated videos

Externí odkaz: http://arxiv.org/abs/2411.17459

Zobrazit plný text záznamu

Report

Identity-Preserving Text-to-Video Generation by Frequency Decomposition

Autor: Yuan, Shenghai, Huang, Jinfa, He, Xianyi, Ge, Yunyuan, Shi, Yujun, Chen, Liuhan, Luo, Jiebo, Yuan, Li

Identity-preserving text-to-video (IPT2V) generation aims to create high-fidelity videos with consistent human identity. It is an important task in video generation but remains an open problem for generative models. This paper pushes the technical fr

Externí odkaz: http://arxiv.org/abs/2411.17440

Zobrazit plný text záznamu

Report

Effort: Efficient Orthogonal Modeling for Generalizable AI-Generated Image Detection

Autor: Yan, Zhiyuan, Wang, Jiangming, Wang, Zhendong, Jin, Peng, Zhang, Ke-Yue, Chen, Shen, Yao, Taiping, Ding, Shouhong, Wu, Baoyuan, Yuan, Li

Existing AI-generated image (AIGI) detection methods often suffer from limited generalization performance. In this paper, we identify a crucial yet previously overlooked asymmetry phenomenon in AIGI detection: during training, models tend to quickly

Externí odkaz: http://arxiv.org/abs/2411.15633

Zobrazit plný text záznamu

Report

LLaVA-CoT: Let Vision Language Models Reason Step-by-Step

Autor: Xu, Guowei, Jin, Peng, Li, Hao, Song, Yibing, Sun, Lichao, Yuan, Li

Large language models have demonstrated substantial advancements in reasoning capabilities, particularly through inference-time scaling, as illustrated by models such as OpenAI's o1. However, current Vision-Language Models (VLMs) often struggle to pe

Externí odkaz: http://arxiv.org/abs/2411.10440

Zobrazit plný text záznamu

Report

Sparse Orthogonal Parameters Tuning for Continual Learning

Autor: Ning, Kun-Peng, Ke, Hai-Jian, Liu, Yu-Yang, Yao, Jia-Yu, Tian, Yong-Hong, Yuan, Li

Continual learning methods based on pre-trained models (PTM) have recently gained attention which adapt to successive downstream tasks without catastrophic forgetting. These methods typically refrain from updating the pre-trained parameters and inste

Externí odkaz: http://arxiv.org/abs/2411.02813

Zobrazit plný text záznamu

Report

ETTFS: An Efficient Training Framework for Time-to-First-Spike Neuron

Autor: Che, Kaiwei, Fang, Wei, Ma, Zhengyu, Yuan, Li, Masquelier, Timothée, Tian, Yonghong

Spiking Neural Networks (SNNs) have attracted considerable attention due to their biologically inspired, event-driven nature, making them highly suitable for neuromorphic hardware. Time-to-First-Spike (TTFS) coding, where neurons fire only once durin

Externí odkaz: http://arxiv.org/abs/2410.23619

Zobrazit plný text záznamu

Report

Spatial-Temporal Search for Spiking Neural Networks

Autor: Che, Kaiwei, Zhou, Zhaokun, Yuan, Li, Zhang, Jianguo, Tian, Yonghong, Leng, Luziwei

Spiking Neural Networks (SNNs) are considered as a potential candidate for the next generation of artificial intelligence with appealing characteristics such as sparse computation and inherent temporal dynamics. By adopting architectures of Artificia

Externí odkaz: http://arxiv.org/abs/2410.18580

Zobrazit plný text záznamu

Report

Few-Shot Joint Multimodal Entity-Relation Extraction via Knowledge-Enhanced Cross-modal Prompt Model

Autor: Yuan, Li, Cai, Yi, Huang, Junsheng

Joint Multimodal Entity-Relation Extraction (JMERE) is a challenging task that aims to extract entities and their relations from text-image pairs in social media posts. Existing methods for JMERE require large amounts of labeled data. However, gather

Externí odkaz: http://arxiv.org/abs/2410.14225

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání