Zobrazeno 1 - 10
of 14 118
pro vyhledávání: '"Chang, D"'
Text-based diffusion video editing systems have been successful in performing edits with high fidelity and textual alignment. However, this success is limited to rigid-type editing such as style transfer and object overlay, while preserving the origi
Externí odkaz:
http://arxiv.org/abs/2409.13037
Open-vocabulary 3D instance segmentation transcends traditional closed-vocabulary methods by enabling the identification of both previously seen and unseen objects in real-world scenarios. It leverages a dual-modality approach, utilizing both 3D poin
Externí odkaz:
http://arxiv.org/abs/2408.08591
Autor:
Yoon, Hee Suk, Yoon, Eunseop, Tee, Joshua Tian Jin, Zhang, Kang, Heo, Yu-Jung, Chang, Du-Seong, Yoo, Chang D.
Multimodal Dialogue Response Generation (MDRG) is a recently proposed task where the model needs to generate responses in texts, images, or a blend of both based on the dialogue context. Due to the lack of a large-scale dataset specifically for this
Externí odkaz:
http://arxiv.org/abs/2408.05926
Test-Time Adaptation (TTA) has emerged as a crucial solution to the domain shift challenge, wherein the target environment diverges from the original training environment. A prime exemplification is TTA for Automatic Speech Recognition (ASR), which e
Externí odkaz:
http://arxiv.org/abs/2408.05769
Reinforcement Learning (RL) agents demonstrating proficiency in a training environment exhibit vulnerability to adversarial perturbations in input observations during deployment. This underscores the importance of building a robust agent before its r
Externí odkaz:
http://arxiv.org/abs/2408.00023
Current image editing methods primarily utilize DDIM Inversion, employing a two-branch diffusion approach to preserve the attributes and layout of the original image. However, these methods encounter challenges with non-rigid edits, which involve alt
Externí odkaz:
http://arxiv.org/abs/2407.17850
Autor:
Yoon, Eunseop, Yoon, Hee Suk, Eom, SooHwan, Han, Gunsoo, Nam, Daniel Wontae, Jo, Daejin, On, Kyoung-Woon, Hasegawa-Johnson, Mark A., Kim, Sungwoong, Yoo, Chang D.
Reinforcement Learning from Human Feedback (RLHF) leverages human preference data to train language models to align more closely with human essence. These human preference data, however, are labeled at the sequence level, creating a mismatch between
Externí odkaz:
http://arxiv.org/abs/2407.16574
Autor:
Ni, Junrui, Wang, Liming, Zhang, Yang, Qian, Kaizhi, Gao, Heting, Hasegawa-Johnson, Mark, Yoo, Chang D.
Recent advancements in supervised automatic speech recognition (ASR) have achieved remarkable performance, largely due to the growing availability of large transcribed speech corpora. However, most languages lack sufficient paired speech and text dat
Externí odkaz:
http://arxiv.org/abs/2406.08380
In video editing, the hallmark of a quality edit lies in its consistent and unobtrusive adjustment. Modification, when integrated, must be smooth and subtle, preserving the natural flow and aligning seamlessly with the original vision. Therefore, our
Externí odkaz:
http://arxiv.org/abs/2406.06044
Publikováno v:
International Conference on Pattern Recognition and Artificial Intelligence (ICPRAI) 2024
Offline reinforcement learning (RL) addresses the challenge of expensive and high-risk data exploration inherent in RL by pre-training policies on vast amounts of offline data, enabling direct deployment or fine-tuning in real-world environments. How
Externí odkaz:
http://arxiv.org/abs/2405.11206