Zobrazeno 1 - 10
of 53 330
pro vyhledávání: '"Yuan Li"'
Autor:
Pang, Yatian, Zhu, Bin, Lin, Bin, Zheng, Mingzhe, Tay, Francis E. H., Lim, Ser-Nam, Yang, Harry, Yuan, Li
In this work, we present DreamDance, a novel method for animating human images using only skeleton pose sequences as conditional inputs. Existing approaches struggle with generating coherent, high-quality content in an efficient and user-friendly man
Externí odkaz:
http://arxiv.org/abs/2412.00397
Autor:
Lin, Bin, Ge, Yunyang, Cheng, Xinhua, Li, Zongjian, Zhu, Bin, Wang, Shaodong, He, Xianyi, Ye, Yang, Yuan, Shenghai, Chen, Liuhan, Jia, Tanghui, Zhang, Junwu, Tang, Zhenyu, Pang, Yatian, She, Bin, Yan, Cen, Hu, Zhiheng, Dong, Xiaoyi, Chen, Lin, Pan, Zhang, Zhou, Xing, Dong, Shaoling, Tian, Yonghong, Yuan, Li
We introduce Open-Sora Plan, an open-source project that aims to contribute a large generation model for generating desired high-resolution videos with long durations based on various user inputs. Our project comprises multiple components for the ent
Externí odkaz:
http://arxiv.org/abs/2412.00131
Video Variational Autoencoder (VAE) encodes videos into a low-dimensional latent space, becoming a key component of most Latent Video Diffusion Models (LVDMs) to reduce model training costs. However, as the resolution and duration of generated videos
Externí odkaz:
http://arxiv.org/abs/2411.17459
Autor:
Yuan, Shenghai, Huang, Jinfa, He, Xianyi, Ge, Yunyuan, Shi, Yujun, Chen, Liuhan, Luo, Jiebo, Yuan, Li
Identity-preserving text-to-video (IPT2V) generation aims to create high-fidelity videos with consistent human identity. It is an important task in video generation but remains an open problem for generative models. This paper pushes the technical fr
Externí odkaz:
http://arxiv.org/abs/2411.17440
Autor:
Yan, Zhiyuan, Wang, Jiangming, Wang, Zhendong, Jin, Peng, Zhang, Ke-Yue, Chen, Shen, Yao, Taiping, Ding, Shouhong, Wu, Baoyuan, Yuan, Li
Existing AI-generated image (AIGI) detection methods often suffer from limited generalization performance. In this paper, we identify a crucial yet previously overlooked asymmetry phenomenon in AIGI detection: during training, models tend to quickly
Externí odkaz:
http://arxiv.org/abs/2411.15633
Large language models have demonstrated substantial advancements in reasoning capabilities, particularly through inference-time scaling, as illustrated by models such as OpenAI's o1. However, current Vision-Language Models (VLMs) often struggle to pe
Externí odkaz:
http://arxiv.org/abs/2411.10440
Continual learning methods based on pre-trained models (PTM) have recently gained attention which adapt to successive downstream tasks without catastrophic forgetting. These methods typically refrain from updating the pre-trained parameters and inste
Externí odkaz:
http://arxiv.org/abs/2411.02813
Spiking Neural Networks (SNNs) have attracted considerable attention due to their biologically inspired, event-driven nature, making them highly suitable for neuromorphic hardware. Time-to-First-Spike (TTFS) coding, where neurons fire only once durin
Externí odkaz:
http://arxiv.org/abs/2410.23619
Spiking Neural Networks (SNNs) are considered as a potential candidate for the next generation of artificial intelligence with appealing characteristics such as sparse computation and inherent temporal dynamics. By adopting architectures of Artificia
Externí odkaz:
http://arxiv.org/abs/2410.18580
Few-Shot Joint Multimodal Entity-Relation Extraction via Knowledge-Enhanced Cross-modal Prompt Model
Joint Multimodal Entity-Relation Extraction (JMERE) is a challenging task that aims to extract entities and their relations from text-image pairs in social media posts. Existing methods for JMERE require large amounts of labeled data. However, gather
Externí odkaz:
http://arxiv.org/abs/2410.14225