Výsledky vyhledávání

Report

Generative Speech Foundation Model Pretraining for High-Quality Speech Extraction and Restoration

Autor: Ku, Pin-Jui, Liu, Alexander H., Korostik, Roman, Huang, Sung-Feng, Fu, Szu-Wei, Jukić, Ante

This paper proposes a generative pretraining foundation model for high-quality speech restoration tasks. By directly operating on complex-valued short-time Fourier transform coefficients, our model does not rely on any vocoders for time-domain signal

Externí odkaz: http://arxiv.org/abs/2409.16117

Zobrazit plný text záznamu

Report

Maximizing Data Efficiency for Cross-Lingual TTS Adaptation by Self-Supervised Representation Mixing and Embedding Initialization

Autor: Huang, Wei-Ping, Huang, Sung-Feng, Lee, Hung-yi

This paper presents an effective transfer learning framework for language adaptation in text-to-speech systems, with a focus on achieving language adaptation using minimal labeled and unlabeled data. While many works focus on reducing the usage of la

Externí odkaz: http://arxiv.org/abs/2402.01692

Zobrazit plný text záznamu

Report

Personalized Lightweight Text-to-Speech: Voice Cloning with Adaptive Structured Pruning

Autor: Huang, Sung-Feng, Chen, Chia-ping, Chen, Zhi-Sheng, Tsai, Yu-Pao, Lee, Hung-yi

Personalized TTS is an exciting and highly desired application that allows users to train their TTS voice using only a few recordings. However, TTS training typically requires many hours of recording and a large model, making it unsuitable for deploy

Externí odkaz: http://arxiv.org/abs/2303.11816

Zobrazit plný text záznamu

Report

Learning Phone Recognition from Unpaired Audio and Phone Sequences Based on Generative Adversarial Network

Autor: Liu, Da-rong, Hsu, Po-chun, Chen, Yi-chen, Huang, Sung-feng, Chuang, Shun-po, Wu, Da-yi, Lee, Hung-yi

ASR has been shown to achieve great performance recently. However, most of them rely on massive paired data, which is not feasible for low-resource languages worldwide. This paper investigates how to learn directly from unpaired phone sequences and s

Externí odkaz: http://arxiv.org/abs/2207.14568

Zobrazit plný text záznamu

Report

Few-Shot Cross-Lingual TTS Using Transferable Phoneme Embedding

Autor: Huang, Wei-Ping, Chen, Po-Chun, Huang, Sung-Feng, Lee, Hung-yi

This paper studies a transferable phoneme embedding framework that aims to deal with the cross-lingual text-to-speech (TTS) problem under the few-shot setting. Transfer learning is a common approach when it comes to few-shot learning since training f

Externí odkaz: http://arxiv.org/abs/2206.15427

Zobrazit plný text záznamu

Kniha

Automatic tuning for linearly tunable filter

Autor: Huang, Sung-Ling

A new tuning scheme for linearly tunable high-Q filters is proposed. The tuning method is based on using the phase information for both frequency and Q factor tuning. There is no need to find out the relationship between a filter's passband magnitude

Externí odkaz: http://hdl.handle.net/1969.1/35

Zobrazit plný text záznamu

Report

Meta-TTS: Meta-Learning for Few-Shot Speaker Adaptive Text-to-Speech

Autor: Huang, Sung-Feng, Lin, Chyi-Jiunn, Liu, Da-Rong, Chen, Yi-Chen, Lee, Hung-yi

Publikováno v: IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 30, pp. 1558-1571, 2022

Personalizing a speech synthesis system is a highly desired application, where the system can generate speech with the user's voice with rare enrolled recordings. There are two main approaches to build such a system in recent works: speaker adaptatio

Externí odkaz: http://arxiv.org/abs/2111.04040

Zobrazit plný text záznamu

Akademický článek

Automatic 3D left atrial strain extraction framework on cardiac computed tomography

Autor: Chen, Ling, Huang, Sung-Hao, Wang, Tzu-Hsiang, Tseng, Vincent S., Tsao, Hsuan-Ming, Tang, Gau-Jun

Publikováno v: In Computer Methods and Programs in Biomedicine July 2024 252

Zobrazit plný text záznamu

Report

SpeechNet: A Universal Modularized Model for Speech Processing Tasks

Autor: Chen, Yi-Chen, Chi, Po-Han, Yang, Shu-wen, Chang, Kai-Wei, Lin, Jheng-hao, Huang, Sung-Feng, Liu, Da-Rong, Liu, Chi-Liang, Lee, Cheng-Kuang, Lee, Hung-yi

There is a wide variety of speech processing tasks ranging from extracting content information from speech signals to generating speech signals. For different tasks, model networks are usually designed and tuned separately. If a universal model can p

Externí odkaz: http://arxiv.org/abs/2105.03070

Zobrazit plný text záznamu

Report

Non-autoregressive Mandarin-English Code-switching Speech Recognition

Autor: Chuang, Shun-Po, Chang, Heng-Jui, Huang, Sung-Feng, Lee, Hung-yi

Mandarin-English code-switching (CS) is frequently used among East and Southeast Asian people. However, the intra-sentence language switching of the two very different languages makes recognizing CS speech challenging. Meanwhile, the recent successfu

Externí odkaz: http://arxiv.org/abs/2104.02258

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání