Zobrazeno 1 - 10
of 6 424
pro vyhledávání: '"Kim, Byeong"'
This paper proposes a novel user-defined keyword spotting framework that accurately detects audio keywords based on text enrollment. Since audio data possesses additional acoustic information compared to text, there are discrepancies between these tw
Externí odkaz:
http://arxiv.org/abs/2408.03593
Autor:
Jang, Youngjoon, Kim, Ji-Hoon, Ahn, Junseok, Kwak, Doyeop, Yang, Hong-Sun, Ju, Yoon-Cheol, Kim, Il-Hwan, Kim, Byeong-Yeol, Chung, Joon Son
The goal of this work is to simultaneously generate natural talking faces and speech outputs from text. We achieve this by integrating Talking Face Generation (TFG) and Text-to-Speech (TTS) systems into a unified framework. We address the main challe
Externí odkaz:
http://arxiv.org/abs/2405.10272
Autor:
Park, Nam-Jin, Kwon, Seong-Ho, Bae, Yoo-Bin, Kim, Byeong-Yeon, Moore, Kevin L., Ahn, Hyo-Sung
This paper presents new results and reinterpretation of existing conditions for strong structural controllability in a structured network determined by the zero/non-zero patterns of edges. For diffusively-coupled networks with self-loops, we first es
Externí odkaz:
http://arxiv.org/abs/2405.05557
We propose a novel speech separation model designed to separate mixtures with an unknown number of speakers. The proposed model stacks 1) a dual-path processing block that can model spectro-temporal patterns, 2) a transformer decoder-based attractor
Externí odkaz:
http://arxiv.org/abs/2401.12473
Publikováno v:
South African Journal of Industrial Engineering, Vol 29, Iss 2, Pp 43-51 (2018)
Electricity scheduling for households based on real-time pricing (RTP) allows flexible and efficient consumption planning. However, this creates errors in predicted costs. Therefore this study used a genetic algorithm (GA) to reduce the error in pred
Externí odkaz:
https://doaj.org/article/51f25517dcd84ae2991b6b10e505feb3
Autor:
Wang, Zhong-Qiu, Cornell, Samuele, Choi, Shukjae, Lee, Younglo, Kim, Byeong-Yeol, Watanabe, Shinji
We propose FSB-LSTM, a novel long short-term memory (LSTM) based architecture that integrates full- and sub-band (FSB) modeling, for single- and multi-channel speech enhancement in the short-time Fourier transform (STFT) domain. The model maintains a
Externí odkaz:
http://arxiv.org/abs/2304.08707
Autor:
Jang, Youngjoon, Rho, Kyeongha, Woo, Jong-Bin, Lee, Hyeongkeun, Park, Jihwan, Lim, Youshin, Kim, Byeong-Yeol, Chung, Joon Son
The goal of this paper is to synthesise talking faces with controllable facial motions. To achieve this goal, we propose two key ideas. The first is to establish a canonical space where every face has the same motion patterns but different identities
Externí odkaz:
http://arxiv.org/abs/2304.03275
Language identification (LID) recognizes the language of a spoken utterance automatically. According to recent studies, LID models trained with an automatic speech recognition (ASR) task perform better than those trained with a LID task only. However
Externí odkaz:
http://arxiv.org/abs/2303.16511
While recent text-to-speech (TTS) systems have made remarkable strides toward human-level quality, the performance of cross-lingual TTS lags behind that of intra-lingual TTS. This gap is mainly rooted from the speaker-language entanglement problem in
Externí odkaz:
http://arxiv.org/abs/2302.14370
Autor:
Wang, Zhong-Qiu, Cornell, Samuele, Choi, Shukjae, Lee, Younglo, Kim, Byeong-Yeol, Watanabe, Shinji
We propose TF-GridNet for speech separation. The model is a novel deep neural network (DNN) integrating full- and sub-band modeling in the time-frequency (T-F) domain. It stacks several blocks, each consisting of an intra-frame full-band module, a su
Externí odkaz:
http://arxiv.org/abs/2211.12433