Výsledky vyhledávání

Report

Get Large Language Models Ready to Speak: A Late-fusion Approach for Speech Generation

Autor: Shen, Maohao, Zhang, Shun, Wu, Jilong, Xiu, Zhiping, AlBadawy, Ehab, Lu, Yiting, Seltzer, Mike, He, Qing

Large language models (LLMs) have revolutionized natural language processing (NLP) with impressive performance across various text-based tasks. However, the extension of text-dominant LLMs to with speech generation tasks remains under-explored. In th

Externí odkaz: http://arxiv.org/abs/2410.20336

Zobrazit plný text záznamu

Report

Self-Supervised Representations for Singing Voice Conversion

Autor: Jayashankar, Tejas, Wu, Jilong, Sari, Leda, Kant, David, Manohar, Vimal, He, Qing

A singing voice conversion model converts a song in the voice of an arbitrary source singer to the voice of a target singer. Recently, methods that leverage self-supervised audio representations such as HuBERT and Wav2Vec 2.0 have helped further the

Externí odkaz: http://arxiv.org/abs/2303.12197

Zobrazit plný text záznamu

Report

Synthetic Cross-accent Data Augmentation for Automatic Speech Recognition

Autor: Klumpp, Philipp, Chitkara, Pooja, Sarı, Leda, Serai, Prashant, Wu, Jilong, Veliche, Irina-Elena, Huang, Rongqing, He, Qing

The awareness for biased ASR datasets or models has increased notably in recent years. Even for English, despite a vast amount of available training data, systems perform worse for non-native speakers. In this work, we improve an accent-conversion mo

Externí odkaz: http://arxiv.org/abs/2303.00802

Zobrazit plný text záznamu

Report

Voice-preserving Zero-shot Multiple Accent Conversion

Autor: Jin, Mumin, Serai, Prashant, Wu, Jilong, Tjandra, Andros, Manohar, Vimal, He, Qing

Most people who have tried to learn a foreign language would have experienced difficulties understanding or speaking with a native speaker's accent. For native speakers, understanding or speaking a new accent is likewise a difficult task. An accent c

Externí odkaz: http://arxiv.org/abs/2211.13282

Zobrazit plný text záznamu

Report

Towards zero-shot Text-based voice editing using acoustic context conditioning, utterance embeddings, and reference encoders

Autor: Fong, Jason, Wang, Yun, Agrawal, Prabhav, Manohar, Vimal, Wu, Jilong, Köhler, Thilo, He, Qing

Text-based voice editing (TBVE) uses synthetic output from text-to-speech (TTS) systems to replace words in an original recording. Recent work has used neural models to produce edited speech that is similar to the original speech in terms of clarity,

Externí odkaz: http://arxiv.org/abs/2210.16045

Zobrazit plný text záznamu

Akademický článek

A mixed gas concentration regression prediction method based on RESHA-ALW

Autor: Wu, Jilong, Zhao, Wenlong, Wu, Fan, Yan, Jia, Feng, Peter, Cui, Hao, Duan, Shukai, Peng, Xiaoyan

Publikováno v: In Sensors and Actuators: B. Chemical 1 November 2024 418

Zobrazit plný text záznamu

Akademický článek

Study on the rapid decompression rate of PA6 liner material of type IV on-board hydrogen storage cylinders

Autor: Li, Xiang, Xiao, Zhenghao, Wu, Jilong, Zeng, Li, Li, Jiepu, Liu, Yitao, Shi, Jun

Publikováno v: In International Journal of Hydrogen Energy 28 October 2024 88:209-227

Zobrazit plný text záznamu

Report

VocBench: A Neural Vocoder Benchmark for Speech Synthesis

Autor: AlBadawy, Ehab A., Gibiansky, Andrew, He, Qing, Wu, Jilong, Chang, Ming-Ching, Lyu, Siwei

Neural vocoders, used for converting the spectral representations of an audio signal to the waveforms, are a commonly used component in speech synthesis pipelines. It focuses on synthesizing waveforms from low-dimensional representation, such as Mel-

Externí odkaz: http://arxiv.org/abs/2112.03099

Zobrazit plný text záznamu

Akademický článek

Electro-induced two-way shape memory thermoplastic polyamide elastomer/carbon nanotubes composites

Autor: Lu, Yiwei, Wu, Yiman, Wu, Jilong, Yang, Pengfei, Zhang, Yuancheng, Zhao, Wei, Zhang, Xiaomeng, Cui, Zhe, Fu, Peng, Pang, Xinchang, Liu, Minying

Publikováno v: In Journal of Materials Research and Technology March-April 2024 29:2062-2071

Zobrazit plný text záznamu

Report

Multi-rate attention architecture for fast streamable Text-to-speech spectrum modeling

Autor: He, Qing, Xiu, Zhiping, Koehler, Thilo, Wu, Jilong

Typical high quality text-to-speech (TTS) systems today use a two-stage architecture, with a spectrum model stage that generates spectral frames and a vocoder stage that generates the actual audio. High-quality spectrum models usually incorporate the

Externí odkaz: http://arxiv.org/abs/2104.00705

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání