Zobrazeno 1 - 10
of 39
pro vyhledávání: '"Cho, Jaewoong"'
Autor:
Cho, In-Young, Cho, Jaewoong
We propose a simple yet effective neural network-based framework for global illumination rendering. Recently, rendering techniques that learn neural radiance caches by minimizing the difference (i.e., residual) between the left and right sides of the
Externí odkaz:
http://arxiv.org/abs/2410.10149
Autor:
Kim, Jaeyeon, Kwon, Sehyun, Choi, Joo Young, Park, Jongho, Cho, Jaewoong, Lee, Jason D., Ryu, Ernest K.
In-context learning (ICL) describes a language model's ability to generate outputs based on a set of input demonstrations and a subsequent query. To understand this remarkable capability, researchers have studied simplified, stylized models. These st
Externí odkaz:
http://arxiv.org/abs/2410.05448
Autor:
Moon, Taehong, Choi, Moonseok, Yun, EungGu, Yoon, Jongmin, Lee, Gayoung, Cho, Jaewoong, Lee, Juho
Diffusion models have shown remarkable performance in generation problems over various domains including images, videos, text, and audio. A practical bottleneck of diffusion models is their sampling speed, due to the repeated evaluation of score esti
Externí odkaz:
http://arxiv.org/abs/2408.05927
Large-scale diffusion models have shown outstanding generative abilities across multiple modalities including images, videos, and audio. However, text-to-speech (TTS) systems typically involve domain-specific modeling factors (e.g., phonemes and phon
Externí odkaz:
http://arxiv.org/abs/2406.11427
Current state-of-the-art diffusion models employ U-Net architectures containing convolutional and (qkv) self-attention layers. The U-Net processes images while being conditioned on the time embedding input for each sampling step and the class or capt
Externí odkaz:
http://arxiv.org/abs/2405.03958
With the emergence of neural audio codecs, which encode multiple streams of discrete tokens from audio, large language models have recently gained attention as a promising approach for zero-shot Text-to-Speech (TTS) synthesis. Despite the ongoing rus
Externí odkaz:
http://arxiv.org/abs/2404.02781
Autor:
Park, Jongho, Park, Jaeseung, Xiong, Zheyang, Lee, Nayoung, Cho, Jaewoong, Oymak, Samet, Lee, Kangwook, Papailiopoulos, Dimitris
State-space models (SSMs), such as Mamba (Gu & Dao, 2023), have been proposed as alternatives to Transformer networks in language modeling, by incorporating gating, convolutions, and input-dependent token selection to mitigate the quadratic cost of m
Externí odkaz:
http://arxiv.org/abs/2402.04248
Recent advancements in large language models (LLMs) have remarkably enhanced performances on a variety of tasks in multiple languages. However, tokenizers in LLMs trained primarily on English-centric corpora often overly fragment a text into characte
Externí odkaz:
http://arxiv.org/abs/2401.10660
Autor:
Park, Inkyu, Cho, Jaewoong
Speech-driven 3D facial animation is challenging due to the scarcity of large-scale visual-audio datasets despite extensive research. Most prior works, typically focused on learning regression models on a small dataset using the method of least squar
Externí odkaz:
http://arxiv.org/abs/2401.08655
Classical clustering methods do not provide users with direct control of the clustering results, and the clustering results may not be consistent with the relevant criterion that a user has in mind. In this work, we present a new methodology for perf
Externí odkaz:
http://arxiv.org/abs/2310.18297