Zobrazeno 1 - 10
of 14 195
pro vyhledávání: '"CHANG, D"'
Human image animation aims to generate a human motion video from the inputs of a reference human image and a target motion video. Current diffusion-based image animation systems exhibit high precision in transferring human identity into targeted moti
Externí odkaz:
http://arxiv.org/abs/2410.24037
Recent work in offline reinforcement learning (RL) has demonstrated the effectiveness of formulating decision-making as return-conditioned supervised learning. Notably, the decision transformer (DT) architecture has shown promise across various domai
Externí odkaz:
http://arxiv.org/abs/2410.03408
Recent studies reveal that well-performing reinforcement learning (RL) agents in training often lack resilience against adversarial perturbations during deployment. This highlights the importance of building a robust agent before deploying it in the
Externí odkaz:
http://arxiv.org/abs/2410.03376
We introduce MDSGen, a novel framework for vision-guided open-domain sound generation optimized for model parameter size, memory consumption, and inference speed. This framework incorporates two key innovations: (1) a redundant video feature removal
Externí odkaz:
http://arxiv.org/abs/2410.02130
Text-based diffusion video editing systems have been successful in performing edits with high fidelity and textual alignment. However, this success is limited to rigid-type editing such as style transfer and object overlay, while preserving the origi
Externí odkaz:
http://arxiv.org/abs/2409.13037
Open-vocabulary 3D instance segmentation transcends traditional closed-vocabulary methods by enabling the identification of both previously seen and unseen objects in real-world scenarios. It leverages a dual-modality approach, utilizing both 3D poin
Externí odkaz:
http://arxiv.org/abs/2408.08591
Autor:
Yoon, Hee Suk, Yoon, Eunseop, Tee, Joshua Tian Jin, Zhang, Kang, Heo, Yu-Jung, Chang, Du-Seong, Yoo, Chang D.
Multimodal Dialogue Response Generation (MDRG) is a recently proposed task where the model needs to generate responses in texts, images, or a blend of both based on the dialogue context. Due to the lack of a large-scale dataset specifically for this
Externí odkaz:
http://arxiv.org/abs/2408.05926
Test-Time Adaptation (TTA) has emerged as a crucial solution to the domain shift challenge, wherein the target environment diverges from the original training environment. A prime exemplification is TTA for Automatic Speech Recognition (ASR), which e
Externí odkaz:
http://arxiv.org/abs/2408.05769
Reinforcement Learning (RL) agents demonstrating proficiency in a training environment exhibit vulnerability to adversarial perturbations in input observations during deployment. This underscores the importance of building a robust agent before its r
Externí odkaz:
http://arxiv.org/abs/2408.00023
Current image editing methods primarily utilize DDIM Inversion, employing a two-branch diffusion approach to preserve the attributes and layout of the original image. However, these methods encounter challenges with non-rigid edits, which involve alt
Externí odkaz:
http://arxiv.org/abs/2407.17850