Zobrazeno 1 - 10
of 245
pro vyhledávání: '"Yoo, Chang D"'
Autor:
Kang, Haeyong, Yoo, Chang D.
Inspired by Well-initialized Lottery Ticket Hypothesis (WLTH), which provides suboptimal fine-tuning solutions, we propose a novel fully fine-tuned continual learning (CL) method referred to as Soft-TransFormers (Soft-TF). Soft-TF sequentially learns
Externí odkaz:
http://arxiv.org/abs/2411.16073
Autor:
Tee, Joshua Tian Jin, Zhang, Kang, Yoon, Hee Suk, Gowda, Dhananjaya Nagaraja, Kim, Chanwoo, Yoo, Chang D.
Diffusion models have recently emerged as a potent tool in generative modeling. However, their inherent iterative nature often results in sluggish image generation due to the requirement for multiple model evaluations. Recent progress has unveiled th
Externí odkaz:
http://arxiv.org/abs/2411.08378
Human image animation aims to generate a human motion video from the inputs of a reference human image and a target motion video. Current diffusion-based image animation systems exhibit high precision in transferring human identity into targeted moti
Externí odkaz:
http://arxiv.org/abs/2410.24037
Recent work in offline reinforcement learning (RL) has demonstrated the effectiveness of formulating decision-making as return-conditioned supervised learning. Notably, the decision transformer (DT) architecture has shown promise across various domai
Externí odkaz:
http://arxiv.org/abs/2410.03408
Recent studies reveal that well-performing reinforcement learning (RL) agents in training often lack resilience against adversarial perturbations during deployment. This highlights the importance of building a robust agent before deploying it in the
Externí odkaz:
http://arxiv.org/abs/2410.03376
We introduce MDSGen, a novel framework for vision-guided open-domain sound generation optimized for model parameter size, memory consumption, and inference speed. This framework incorporates two key innovations: (1) a redundant video feature removal
Externí odkaz:
http://arxiv.org/abs/2410.02130
Text-based diffusion video editing systems have been successful in performing edits with high fidelity and textual alignment. However, this success is limited to rigid-type editing such as style transfer and object overlay, while preserving the origi
Externí odkaz:
http://arxiv.org/abs/2409.13037
Open-vocabulary 3D instance segmentation transcends traditional closed-vocabulary methods by enabling the identification of both previously seen and unseen objects in real-world scenarios. It leverages a dual-modality approach, utilizing both 3D poin
Externí odkaz:
http://arxiv.org/abs/2408.08591
Autor:
Yoon, Hee Suk, Yoon, Eunseop, Tee, Joshua Tian Jin, Zhang, Kang, Heo, Yu-Jung, Chang, Du-Seong, Yoo, Chang D.
Multimodal Dialogue Response Generation (MDRG) is a recently proposed task where the model needs to generate responses in texts, images, or a blend of both based on the dialogue context. Due to the lack of a large-scale dataset specifically for this
Externí odkaz:
http://arxiv.org/abs/2408.05926
Test-Time Adaptation (TTA) has emerged as a crucial solution to the domain shift challenge, wherein the target environment diverges from the original training environment. A prime exemplification is TTA for Automatic Speech Recognition (ASR), which e
Externí odkaz:
http://arxiv.org/abs/2408.05769