Zobrazeno 1 - 10
of 53
pro vyhledávání: '"Wu, JiLong"'
Autor:
Shen, Maohao, Zhang, Shun, Wu, Jilong, Xiu, Zhiping, AlBadawy, Ehab, Lu, Yiting, Seltzer, Mike, He, Qing
Large language models (LLMs) have revolutionized natural language processing (NLP) with impressive performance across various text-based tasks. However, the extension of text-dominant LLMs to with speech generation tasks remains under-explored. In th
Externí odkaz:
http://arxiv.org/abs/2410.20336
A singing voice conversion model converts a song in the voice of an arbitrary source singer to the voice of a target singer. Recently, methods that leverage self-supervised audio representations such as HuBERT and Wav2Vec 2.0 have helped further the
Externí odkaz:
http://arxiv.org/abs/2303.12197
Autor:
Klumpp, Philipp, Chitkara, Pooja, Sarı, Leda, Serai, Prashant, Wu, Jilong, Veliche, Irina-Elena, Huang, Rongqing, He, Qing
The awareness for biased ASR datasets or models has increased notably in recent years. Even for English, despite a vast amount of available training data, systems perform worse for non-native speakers. In this work, we improve an accent-conversion mo
Externí odkaz:
http://arxiv.org/abs/2303.00802
Most people who have tried to learn a foreign language would have experienced difficulties understanding or speaking with a native speaker's accent. For native speakers, understanding or speaking a new accent is likewise a difficult task. An accent c
Externí odkaz:
http://arxiv.org/abs/2211.13282
Autor:
Fong, Jason, Wang, Yun, Agrawal, Prabhav, Manohar, Vimal, Wu, Jilong, Köhler, Thilo, He, Qing
Text-based voice editing (TBVE) uses synthetic output from text-to-speech (TTS) systems to replace words in an original recording. Recent work has used neural models to produce edited speech that is similar to the original speech in terms of clarity,
Externí odkaz:
http://arxiv.org/abs/2210.16045
Autor:
Wu, Jilong, Zhao, Wenlong, Wu, Fan, Yan, Jia, Feng, Peter, Cui, Hao, Duan, Shukai, Peng, Xiaoyan
Publikováno v:
In Sensors and Actuators: B. Chemical 1 November 2024 418
Neural vocoders, used for converting the spectral representations of an audio signal to the waveforms, are a commonly used component in speech synthesis pipelines. It focuses on synthesizing waveforms from low-dimensional representation, such as Mel-
Externí odkaz:
http://arxiv.org/abs/2112.03099
Typical high quality text-to-speech (TTS) systems today use a two-stage architecture, with a spectrum model stage that generates spectral frames and a vocoder stage that generates the actual audio. High-quality spectrum models usually incorporate the
Externí odkaz:
http://arxiv.org/abs/2104.00705
Autor:
Lu, Yiwei, Wu, Yiman, Wu, Jilong, Yang, Pengfei, Zhang, Yuancheng, Zhao, Wei, Zhang, Xiaomeng, Cui, Zhe, Fu, Peng, Pang, Xinchang, Liu, Minying
Publikováno v:
In Journal of Materials Research and Technology March-April 2024 29:2062-2071
Fine-grained image classification is a challenging problem, since the difficulty of finding discriminative features. To handle this circumstance, basically, there are two ways to go. One is use attention based method to focus on informative areas, wh
Externí odkaz:
http://arxiv.org/abs/2001.02219