Zobrazeno 1 - 9
of 9
pro vyhledávání: '"Choi, Yeunju"'
Autor:
Ahn, Junseok, Kim, Youkyum, Choi, Yeunju, Kwak, Doyeop, Kim, Ji-Hoon, Mun, Seongkyu, Chung, Joon Son
This paper introduces VoxSim, a dataset of perceptual voice similarity ratings. Recent efforts to automate the assessment of speech synthesis technologies have primarily focused on predicting mean opinion score of naturalness, leaving speaker voice s
Externí odkaz:
http://arxiv.org/abs/2407.18505
Publikováno v:
IEEE Access, vol. 10, pp. 52621 - 52629, 2022
Although recent neural text-to-speech (TTS) systems have achieved high-quality speech synthesis, there are cases where a TTS system generates low-quality speech, mainly caused by limited training data or information loss during knowledge distillation
Externí odkaz:
http://arxiv.org/abs/2011.01174
Publikováno v:
in IEEE Access, vol. 8, pp. 175448-175466, 2020
Speaker verification (SV) has recently attracted considerable research interest due to the growing popularity of virtual assistants. At the same time, there is an increasing requirement for an SV system: it should be robust to short speech segments,
Externí odkaz:
http://arxiv.org/abs/2010.02477
Publikováno v:
Proc. Interspeech 2020, pp. 1743-1747
While deep learning has made impressive progress in speech synthesis and voice conversion, the assessment of the synthesized speech is still carried out by human participants. Several recent papers have proposed deep-learning-based assessment models
Externí odkaz:
http://arxiv.org/abs/2008.03710
Several studies have proposed deep-learning-based models to predict the mean opinion score (MOS) of synthesized speech, showing the possibility of replacing human raters. However, inter- and intra-rater variability in MOSs makes it hard to ensure the
Externí odkaz:
http://arxiv.org/abs/2007.08267
Recent works of utilizing phonetic posteriograms (PPGs) for non-parallel voice conversion have significantly increased the usability of voice conversion since the source and target DBs are no longer required for matching contents. In this approach, t
Externí odkaz:
http://arxiv.org/abs/2006.06937
Publikováno v:
Proc. Interspeech 2020, pp. 1501-1505
Currently, the most widely used approach for speaker verification is the deep speaker embedding learning. In this approach, we obtain a speaker embedding vector by pooling single-scale features that are extracted from the last layer of a speaker feat
Externí odkaz:
http://arxiv.org/abs/2004.03194
Publikováno v:
Proc. of ASRU 2019, pp. 365-372
Voice activity detection (VAD), which classifies frames as speech or non-speech, is an important module in many speech applications including speaker verification. In this paper, we propose a novel method, called self-adaptive soft VAD, to incorporat
Externí odkaz:
http://arxiv.org/abs/1909.11886
Publikováno v:
Proc. of Interspeech 2019, 2019, pp. 4030-4034
In this paper, we propose a new pooling method called spatial pyramid encoding (SPE) to generate speaker embeddings for text-independent speaker verification. We first partition the output feature maps from a deep residual network (ResNet) into incre
Externí odkaz:
http://arxiv.org/abs/1906.08333