Zobrazeno 1 - 10
of 40
pro vyhledávání: '"Chung, Soo Whan"'
This paper introduces a novel task in generative speech processing, Acoustic Scene Transfer (AST), which aims to transfer acoustic scenes of speech signals to diverse environments. AST promises an immersive experience in speech perception by adapting
Externí odkaz:
http://arxiv.org/abs/2406.12688
We introduce Multi-level feature Fusion-based Periodicity Analysis Model (MF-PAM), a novel deep learning-based pitch estimation model that accurately estimates pitch trajectory in noisy and reverberant acoustic environments. Our model leverages the p
Externí odkaz:
http://arxiv.org/abs/2306.09640
This paper introduces an end-to-end neural speech restoration model, HD-DEMUCS, demonstrating efficacy across multiple distortion environments. Unlike conventional approaches that employ cascading frameworks to remove undesirable noise first and then
Externí odkaz:
http://arxiv.org/abs/2306.01411
Enhancing speech quality is an indispensable yet difficult task as it is often complicated by a range of degradation factors. In addition to additive noise, reverberation, clipping, and speech attenuation can all adversely affect speech quality. Spee
Externí odkaz:
http://arxiv.org/abs/2305.18739
Autor:
Kwon, Yoohwan, Chung, Soo-Whan
Multi-lingual speech recognition aims to distinguish linguistic expressions in different languages and integrate acoustic processing simultaneously. In contrast, current multi-lingual speech recognition research follows a language-aware paradigm, mai
Externí odkaz:
http://arxiv.org/abs/2302.13750
The goal of this work is zero-shot text-to-speech synthesis, with speaking styles and voices learnt from facial characteristics. Inspired by the natural fact that people can imagine the voice of someone when they look at his or her face, we introduce
Externí odkaz:
http://arxiv.org/abs/2302.13700
We propose DiffSep, a new single channel source separation method based on score-matching of a stochastic differential equation (SDE). We craft a tailored continuous time diffusion-mixing process starting from the separated sources and converging to
Externí odkaz:
http://arxiv.org/abs/2210.17327
In this paper, we propose a novel end-to-end user-defined keyword spotting method that utilizes linguistically corresponding patterns between speech and text sequences. Unlike previous approaches requiring speech keyword enrollment, our method compar
Externí odkaz:
http://arxiv.org/abs/2206.15400
Autor:
Shim, Hye-jin, Tak, Hemlata, Liu, Xuechen, Heo, Hee-Soo, Jung, Jee-weon, Chung, Joon Son, Chung, Soo-Whan, Yu, Ha-Jin, Lee, Bong-Jin, Todisco, Massimiliano, Delgado, Héctor, Lee, Kong Aik, Sahidullah, Md, Kinnunen, Tomi, Evans, Nicholas
Deep learning has brought impressive progress in the study of both automatic speaker verification (ASV) and spoofing countermeasures (CM). Although solutions are mutually dependent, they have typically evolved as standalone sub-systems whereby CM sol
Externí odkaz:
http://arxiv.org/abs/2204.09976
Autor:
Jung, Jee-weon, Tak, Hemlata, Shim, Hye-jin, Heo, Hee-Soo, Lee, Bong-Jin, Chung, Soo-Whan, Yu, Ha-Jin, Evans, Nicholas, Kinnunen, Tomi
The first spoofing-aware speaker verification (SASV) challenge aims to integrate research efforts in speaker verification and anti-spoofing. We extend the speaker verification scenario by introducing spoofed trials to the usual set of target and impo
Externí odkaz:
http://arxiv.org/abs/2203.14732