Výsledky vyhledávání

Report

DNI: Dilutional Noise Initialization for Diffusion Video Editing

Autor: Yoon, Sunjae, Koo, Gwanhyeong, Hong, Ji Woo, Yoo, Chang D.

Text-based diffusion video editing systems have been successful in performing edits with high fidelity and textual alignment. However, this success is limited to rigid-type editing such as style transfer and object overlay, while preserving the origi

Externí odkaz: http://arxiv.org/abs/2409.13037

Zobrazit plný text záznamu

Report

Zero-Shot Dual-Path Integration Framework for Open-Vocabulary 3D Instance Segmentation

Autor: Ton, Tri, Hong, Ji Woo, Eom, SooHwan, Shim, Jun Yeop, Kim, Junyeong, Yoo, Chang D.

Open-vocabulary 3D instance segmentation transcends traditional closed-vocabulary methods by enabling the identification of both previously seen and unseen objects in real-world scenarios. It leverages a dual-modality approach, utilizing both 3D poin

Externí odkaz: http://arxiv.org/abs/2408.08591

Zobrazit plný text záznamu

Report

BI-MDRG: Bridging Image History in Multimodal Dialogue Response Generation

Autor: Yoon, Hee Suk, Yoon, Eunseop, Tee, Joshua Tian Jin, Zhang, Kang, Heo, Yu-Jung, Chang, Du-Seong, Yoo, Chang D.

Multimodal Dialogue Response Generation (MDRG) is a recently proposed task where the model needs to generate responses in texts, images, or a blend of both based on the dialogue context. Due to the lack of a large-scale dataset specifically for this

Externí odkaz: http://arxiv.org/abs/2408.05926

Zobrazit plný text záznamu

Report

LI-TTA: Language Informed Test-Time Adaptation for Automatic Speech Recognition

Autor: Yoon, Eunseop, Yoon, Hee Suk, Harvill, John, Hasegawa-Johnson, Mark, Yoo, Chang D.

Test-Time Adaptation (TTA) has emerged as a crucial solution to the domain shift challenge, wherein the target environment diverges from the original training environment. A prime exemplification is TTA for Automatic Speech Recognition (ASR), which e

Externí odkaz: http://arxiv.org/abs/2408.05769

Zobrazit plný text záznamu

Report

On the Perturbed States for Transformed Input-robust Reinforcement Learning

Autor: Luu, Tung M., Kang, Haeyong, Ton, Tri, Nguyen, Thanh, Yoo, Chang D.

Reinforcement Learning (RL) agents demonstrating proficiency in a training environment exhibit vulnerability to adversarial perturbations in input observations during deployment. This underscores the importance of building a robust agent before its r

Externí odkaz: http://arxiv.org/abs/2408.00023

Zobrazit plný text záznamu

Report

FlexiEdit: Frequency-Aware Latent Refinement for Enhanced Non-Rigid Editing

Autor: Koo, Gwanhyeong, Yoon, Sunjae, Hong, Ji Woo, Yoo, Chang D.

Current image editing methods primarily utilize DDIM Inversion, employing a two-branch diffusion approach to preserve the attributes and layout of the original image. However, these methods encounter challenges with non-rigid edits, which involve alt

Externí odkaz: http://arxiv.org/abs/2407.17850

Zobrazit plný text záznamu

Report

TLCR: Token-Level Continuous Reward for Fine-grained Reinforcement Learning from Human Feedback

Autor: Yoon, Eunseop, Yoon, Hee Suk, Eom, SooHwan, Han, Gunsoo, Nam, Daniel Wontae, Jo, Daejin, On, Kyoung-Woon, Hasegawa-Johnson, Mark A., Kim, Sungwoong, Yoo, Chang D.

Reinforcement Learning from Human Feedback (RLHF) leverages human preference data to train language models to align more closely with human essence. These human preference data, however, are labeled at the sequence level, creating a mismatch between

Externí odkaz: http://arxiv.org/abs/2407.16574

Zobrazit plný text záznamu

Report

Towards Unsupervised Speech Recognition Without Pronunciation Models

Autor: Ni, Junrui, Wang, Liming, Zhang, Yang, Qian, Kaizhi, Gao, Heting, Hasegawa-Johnson, Mark, Yoo, Chang D.

Recent advancements in supervised automatic speech recognition (ASR) have achieved remarkable performance, largely due to the growing availability of large transcribed speech corpora. However, most languages lack sufficient paired speech and text dat

Externí odkaz: http://arxiv.org/abs/2406.08380

Zobrazit plný text záznamu

Report

FRAG: Frequency Adapting Group for Diffusion Video Editing

Autor: Yoon, Sunjae, Koo, Gwanhyeong, Kim, Geonwoo, Yoo, Chang D.

In video editing, the hallmark of a quality edit lies in its consistent and unobtrusive adjustment. Modification, when integrated, must be smooth and subtle, preserving the natural flow and aligning seamlessly with the original vision. Therefore, our

Externí odkaz: http://arxiv.org/abs/2406.06044

Zobrazit plný text záznamu

Report

Towards Robust Policy: Enhancing Offline Reinforcement Learning with Adversarial Attacks and Defenses

Autor: Nguyen, Thanh, Luu, Tung M., Ton, Tri, Yoo, Chang D.

Publikováno v: International Conference on Pattern Recognition and Artificial Intelligence (ICPRAI) 2024

Offline reinforcement learning (RL) addresses the challenge of expensive and high-risk data exploration inherent in RL by pre-training policies on vast amounts of offline data, enabling direct deployment or fine-tuning in real-world environments. How

Externí odkaz: http://arxiv.org/abs/2405.11206

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání