Zobrazeno 1 - 10
of 119
pro vyhledávání: '"Lee, JunHyeok"'
Autor:
Lee, Junhyeok, Oh, Yujin, Lee, Dahyoun, Joh, Hyon Keun, Sohn, Chul-Ho, Baik, Sung Hyun, Jung, Cheol Kyu, Park, Jung Hyun, Choi, Kyu Sung, Kim, Byung-Hoon, Ye, Jong Chul
Acute ischemic stroke (AIS) requires time-critical management, with hours of delayed intervention leading to an irreversible disability of the patient. Since diffusion weighted imaging (DWI) using the magnetic resonance image (MRI) plays a crucial ro
Externí odkaz:
http://arxiv.org/abs/2411.15490
Autor:
Lee, Junhyeok, Kim, Hyeongju
Monotonic alignment search (MAS), introduced by Glow-TTS, is one of the most popular algorithm in TTS to estimate unknown alignments between text and speech. Since this algorithm needs to search for the most probable alignment with dynamic programmin
Externí odkaz:
http://arxiv.org/abs/2409.07704
Text-to-Speech (TTS) models have advanced significantly, aiming to accurately replicate human speech's diversity, including unique speaker identities and linguistic nuances. Despite these advancements, achieving an optimal balance between speaker-fid
Externí odkaz:
http://arxiv.org/abs/2408.14423
Non-autoregressive GAN-based neural vocoders are widely used due to their fast inference speed and high perceptual quality. However, they often suffer from audible artifacts such as tonal artifacts in their generated results. Therefore, we propose Je
Externí odkaz:
http://arxiv.org/abs/2406.06111
Frequency dynamic convolution (FDY conv) has shown the state-of-the-art performance in sound event detection (SED) using frequency-adaptive kernels obtained by frequency-varying combination of basis kernels. However, FDY conv lacks an explicit mean t
Externí odkaz:
http://arxiv.org/abs/2406.05341
We address the challenge of multi-agent cooperation, where agents achieve a common goal by cooperating with decentralized agents under complex partial observations. Existing cooperative agent systems often struggle with efficiently processing continu
Externí odkaz:
http://arxiv.org/abs/2405.16751
We propose LatentSwap, a simple face swapping framework generating a face swap latent code of a given generator. Utilizing randomly sampled latent codes, our framework is light and does not require datasets besides employing the pre-trained models, w
Externí odkaz:
http://arxiv.org/abs/2402.18351
The goal of DCASE 2023 Challenge Task 7 is to generate various sound clips for Foley sound synthesis (FSS) by "category-to-sound" approach. "Category" is expressed by a single index while corresponding "sound" covers diverse and different sound examp
Externí odkaz:
http://arxiv.org/abs/2306.05004
Previous pitch-controllable text-to-speech (TTS) models rely on directly modeling fundamental frequency, leading to low variance in synthesized speech. To address this issue, we propose PITS, an end-to-end pitch-controllable TTS model that utilizes v
Externí odkaz:
http://arxiv.org/abs/2302.12391
Preference-based reinforcement learning (PbRL) is an approach that enables RL agents to learn from preference, which is particularly useful when formulating a reward function is challenging. Existing PbRL methods generally involve a two-step procedur
Externí odkaz:
http://arxiv.org/abs/2301.12842