Zobrazeno 1 - 10
of 1 121
pro vyhledávání: '"Huang, Sung"'
This paper proposes a generative pretraining foundation model for high-quality speech restoration tasks. By directly operating on complex-valued short-time Fourier transform coefficients, our model does not rely on any vocoders for time-domain signal
Externí odkaz:
http://arxiv.org/abs/2409.16117
This paper presents an effective transfer learning framework for language adaptation in text-to-speech systems, with a focus on achieving language adaptation using minimal labeled and unlabeled data. While many works focus on reducing the usage of la
Externí odkaz:
http://arxiv.org/abs/2402.01692
Personalized TTS is an exciting and highly desired application that allows users to train their TTS voice using only a few recordings. However, TTS training typically requires many hours of recording and a large model, making it unsuitable for deploy
Externí odkaz:
http://arxiv.org/abs/2303.11816
Autor:
Liu, Da-rong, Hsu, Po-chun, Chen, Yi-chen, Huang, Sung-feng, Chuang, Shun-po, Wu, Da-yi, Lee, Hung-yi
ASR has been shown to achieve great performance recently. However, most of them rely on massive paired data, which is not feasible for low-resource languages worldwide. This paper investigates how to learn directly from unpaired phone sequences and s
Externí odkaz:
http://arxiv.org/abs/2207.14568
This paper studies a transferable phoneme embedding framework that aims to deal with the cross-lingual text-to-speech (TTS) problem under the few-shot setting. Transfer learning is a common approach when it comes to few-shot learning since training f
Externí odkaz:
http://arxiv.org/abs/2206.15427
Autor:
Huang, Sung-Ling
A new tuning scheme for linearly tunable high-Q filters is proposed. The tuning method is based on using the phase information for both frequency and Q factor tuning. There is no need to find out the relationship between a filter's passband magnitude
Externí odkaz:
http://hdl.handle.net/1969.1/35
Publikováno v:
IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 30, pp. 1558-1571, 2022
Personalizing a speech synthesis system is a highly desired application, where the system can generate speech with the user's voice with rare enrolled recordings. There are two main approaches to build such a system in recent works: speaker adaptatio
Externí odkaz:
http://arxiv.org/abs/2111.04040
Autor:
Chen, Ling, Huang, Sung-Hao, Wang, Tzu-Hsiang, Tseng, Vincent S., Tsao, Hsuan-Ming, Tang, Gau-Jun
Publikováno v:
In Computer Methods and Programs in Biomedicine July 2024 252
Autor:
Chen, Yi-Chen, Chi, Po-Han, Yang, Shu-wen, Chang, Kai-Wei, Lin, Jheng-hao, Huang, Sung-Feng, Liu, Da-Rong, Liu, Chi-Liang, Lee, Cheng-Kuang, Lee, Hung-yi
There is a wide variety of speech processing tasks ranging from extracting content information from speech signals to generating speech signals. For different tasks, model networks are usually designed and tuned separately. If a universal model can p
Externí odkaz:
http://arxiv.org/abs/2105.03070
Mandarin-English code-switching (CS) is frequently used among East and Southeast Asian people. However, the intra-sentence language switching of the two very different languages makes recognizing CS speech challenging. Meanwhile, the recent successfu
Externí odkaz:
http://arxiv.org/abs/2104.02258