Výsledky vyhledávání - "Mun, Seongkyu"

Report

VoxSim: A perceptual voice similarity dataset

Autor: Ahn, Junseok, Kim, Youkyum, Choi, Yeunju, Kwak, Doyeop, Kim, Ji-Hoon, Mun, Seongkyu, Chung, Joon Son

This paper introduces VoxSim, a dataset of perceptual voice similarity ratings. Recent efforts to automate the assessment of speech synthesis technologies have primarily focused on predicting mean opinion score of naturalness, leaving speaker voice s

Externí odkaz: http://arxiv.org/abs/2407.18505

Zobrazit plný text záznamu

Report

Latent Filling: Latent Space Data Augmentation for Zero-shot Speech Synthesis

Autor: Bae, Jae-Sung, Lee, Joun Yeop, Lee, Ji-Hyun, Mun, Seongkyu, Kang, Taehwa, Cho, Hoon-Young, Kim, Chanwoo

Previous works in zero-shot text-to-speech (ZS-TTS) have attempted to enhance its systems by enlarging the training data through crowd-sourcing or augmenting existing speech data. However, the use of low-quality data has led to a decline in the overa

Externí odkaz: http://arxiv.org/abs/2310.03538

Zobrazit plný text záznamu

Report

An Empirical Study on L2 Accents of Cross-lingual Text-to-Speech Systems via Vowel Space

Autor: Lee, Jihwan, Bae, Jae-Sung, Mun, Seongkyu, Choi, Heejin, Lee, Joun Yeop, Cho, Hoon-Young, Kim, Chanwoo

With the recent developments in cross-lingual Text-to-Speech (TTS) systems, L2 (second-language, or foreign) accent problems arise. Moreover, running a subjective evaluation for such cross-lingual TTS systems is troublesome. The vowel space analysis,

Externí odkaz: http://arxiv.org/abs/2211.03078

Zobrazit plný text záznamu

Report

Into-TTS : Intonation Template Based Prosody Control System

Autor: Lee, Jihwan, Lee, Joun Yeop, Choi, Heejin, Mun, Seongkyu, Park, Sangjun, Bae, Jae-Sung, Kim, Chanwoo

Intonations play an important role in delivering the intention of a speaker. However, current end-to-end TTS systems often fail to model proper intonations. To alleviate this problem, we propose a novel, intuitive method to synthesize speech in diffe

Externí odkaz: http://arxiv.org/abs/2204.01271

Zobrazit plný text záznamu

Report

Streaming end-to-end speech recognition with jointly trained neural feature enhancement

Autor: Kim, Chanwoo, Garg, Abhinav, Gowda, Dhananjaya, Mun, Seongkyu, Han, Changwoo

In this paper, we present a streaming end-to-end speech recognition model based on Monotonic Chunkwise Attention (MoCha) jointly trained with enhancement layers. Even though the MoCha attention enables streaming speech recognition with recognition ac

Externí odkaz: http://arxiv.org/abs/2105.01254

Zobrazit plný text záznamu

Report

Overcoming label noise in audio event detection using sequential labeling

Autor: Kim, Jae-Bin, Mun, Seongkyu, Oh, Myungwoo, Choe, Soyeon, Lee, Yong-Hyeok, Park, Hyung-Min

This paper addresses the noisy label issue in audio event detection (AED) by refining strong labels as sequential labels with inaccurate timestamps removed. In AED, strong labels contain the occurrence of a specific event and its timestamps correspon

Externí odkaz: http://arxiv.org/abs/2007.05191

Zobrazit plný text záznamu

Report

Metric Learning for Keyword Spotting

Autor: Huh, Jaesung, Lee, Minjae, Heo, Heesoo, Mun, Seongkyu, Chung, Joon Son

The goal of this work is to train effective representations for keyword spotting via metric learning. Most existing works address keyword spotting as a closed-set classification problem, where both target and non-target keywords are predefined. There

Externí odkaz: http://arxiv.org/abs/2005.08776

Zobrazit plný text záznamu

Report

In defence of metric learning for speaker recognition

Autor: Chung, Joon Son, Huh, Jaesung, Mun, Seongkyu, Lee, Minjae, Heo, Hee Soo, Choe, Soyeon, Ham, Chiheon, Jung, Sunghwan, Lee, Bong-Jin, Han, Icksang

The objective of this paper is 'open-set' speaker recognition of unseen speakers, where ideal embeddings should be able to condense information into a compact utterance-level representation that has small intra-speaker and large inter-speaker distanc

Externí odkaz: http://arxiv.org/abs/2003.11982

Zobrazit plný text záznamu

Report

The sound of my voice: speaker representation loss for target voice separation

Autor: Mun, Seongkyu, Choe, Soyeon, Huh, Jaesung, Chung, Joon Son

Content and style representations have been widely studied in the field of style transfer. In this paper, we propose a new loss function using speaker content representation for audio source separation, and we call it speaker representation loss. The

Externí odkaz: http://arxiv.org/abs/1911.02411

Zobrazit plný text záznamu

Report

Delving into VoxCeleb: environment invariant speaker recognition

Autor: Chung, Joon Son, Huh, Jaesung, Mun, Seongkyu

Research in speaker recognition has recently seen significant progress due to the application of neural network models and the availability of new large-scale datasets. There has been a plethora of work in search for more powerful architectures or lo

Externí odkaz: http://arxiv.org/abs/1910.11238

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání