Výsledky vyhledávání

Report

Bridging the Gap between Audio and Text using Parallel-attention for User-defined Keyword Spotting

Autor: Kim, Youkyum, Jung, Jaemin, Park, Jihwan, Kim, Byeong-Yeol, Chung, Joon Son

This paper proposes a novel user-defined keyword spotting framework that accurately detects audio keywords based on text enrollment. Since audio data possesses additional acoustic information compared to text, there are discrepancies between these tw

Externí odkaz: http://arxiv.org/abs/2408.03593

Zobrazit plný text záznamu

Report

Faces that Speak: Jointly Synthesising Talking Face and Speech from Text

Autor: Jang, Youngjoon, Kim, Ji-Hoon, Ahn, Junseok, Kwak, Doyeop, Yang, Hong-Sun, Ju, Yoon-Cheol, Kim, Il-Hwan, Kim, Byeong-Yeol, Chung, Joon Son

The goal of this work is to simultaneously generate natural talking faces and speech outputs from text. We achieve this by integrating Talking Face Generation (TFG) and Text-to-Speech (TTS) systems into a unified framework. We address the main challe

Externí odkaz: http://arxiv.org/abs/2405.10272

Zobrazit plný text záznamu

Report

Composition Rules for Strong Structural Controllability and Minimum Input Problem in Diffusively-Coupled Networks

Autor: Park, Nam-Jin, Kwon, Seong-Ho, Bae, Yoo-Bin, Kim, Byeong-Yeon, Moore, Kevin L., Ahn, Hyo-Sung

This paper presents new results and reinterpretation of existing conditions for strong structural controllability in a structured network determined by the zero/non-zero patterns of edges. For diffusively-coupled networks with self-loops, we first es

Externí odkaz: http://arxiv.org/abs/2405.05557

Zobrazit plný text záznamu

Report

Boosting Unknown-number Speaker Separation with Transformer Decoder-based Attractor

Autor: Lee, Younglo, Choi, Shukjae, Kim, Byeong-Yeol, Wang, Zhong-Qiu, Watanabe, Shinji

We propose a novel speech separation model designed to separate mixtures with an unknown number of speakers. The proposed model stacks 1) a dual-path processing block that can model spectro-temporal patterns, 2) a transformer decoder-based attractor

Externí odkaz: http://arxiv.org/abs/2401.12473

Zobrazit plný text záznamu

Akademický článek

AN INTELLIGENT RTP-BASED HOUSEHOLD ELECTRICITY SCHEDULING BY A GENETIC ALGORITHM IN SMART GRID

Autor: Kim, Byeong-Yeon, Seok, Hyesung, Kang, Y.

Publikováno v: South African Journal of Industrial Engineering, Vol 29, Iss 2, Pp 43-51 (2018)

Electricity scheduling for households based on real-time pricing (RTP) allows flexible and efficient consumption planning. However, this creates errors in predicted costs. Therefore this study used a genetic algorithm (GA) to reduce the error in pred

Externí odkaz: https://doaj.org/article/51f25517dcd84ae2991b6b10e505feb3

Zobrazit plný text záznamu

Report

Neural Speech Enhancement with Very Low Algorithmic Latency and Complexity via Integrated Full- and Sub-Band Modeling

Autor: Wang, Zhong-Qiu, Cornell, Samuele, Choi, Shukjae, Lee, Younglo, Kim, Byeong-Yeol, Watanabe, Shinji

We propose FSB-LSTM, a novel long short-term memory (LSTM) based architecture that integrates full- and sub-band (FSB) modeling, for single- and multi-channel speech enhancement in the short-time Fourier transform (STFT) domain. The model maintains a

Externí odkaz: http://arxiv.org/abs/2304.08707

Zobrazit plný text záznamu

Report

That's What I Said: Fully-Controllable Talking Face Generation

Autor: Jang, Youngjoon, Rho, Kyeongha, Woo, Jong-Bin, Lee, Hyeongkeun, Park, Jihwan, Lim, Youshin, Kim, Byeong-Yeol, Chung, Joon Son

The goal of this paper is to synthesise talking faces with controllable facial motions. To achieve this goal, we propose two key ideas. The first is to establish a canonical space where every face has the same motion patterns but different identities

Externí odkaz: http://arxiv.org/abs/2304.03275

Zobrazit plný text záznamu

Report

Joint unsupervised and supervised learning for context-aware language identification

Autor: Park, Jinseok, Kim, Hyung Yong, Park, Jihwan, Kim, Byeong-Yeol, Choi, Shukjae, Lim, Yunkyu

Language identification (LID) recognizes the language of a spoken utterance automatically. According to recent studies, LID models trained with an automatic speech recognition (ASR) task perform better than those trained with a LID task only. However

Externí odkaz: http://arxiv.org/abs/2303.16511

Zobrazit plný text záznamu

Report

CrossSpeech: Speaker-independent Acoustic Representation for Cross-lingual Speech Synthesis

Autor: Kim, Ji-Hoon, Yang, Hong-Sun, Ju, Yoon-Cheol, Kim, Il-Hwan, Kim, Byeong-Yeol

While recent text-to-speech (TTS) systems have made remarkable strides toward human-level quality, the performance of cross-lingual TTS lags behind that of intra-lingual TTS. This gap is mainly rooted from the speaker-language entanglement problem in

Externí odkaz: http://arxiv.org/abs/2302.14370

Zobrazit plný text záznamu

Report

TF-GridNet: Integrating Full- and Sub-Band Modeling for Speech Separation

Autor: Wang, Zhong-Qiu, Cornell, Samuele, Choi, Shukjae, Lee, Younglo, Kim, Byeong-Yeol, Watanabe, Shinji

We propose TF-GridNet for speech separation. The model is a novel deep neural network (DNN) integrating full- and sub-band modeling in the time-frequency (T-F) domain. It stacks several blocks, each consisting of an intra-frame full-band module, a su

Externí odkaz: http://arxiv.org/abs/2211.12433

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání