Zobrazeno 1 - 10
of 276
pro vyhledávání: '"Wang, Longbiao"'
Autor:
Wang, Tianrui, Li, Jin, Ma, Ziyang, Cao, Rui, Chen, Xie, Wang, Longbiao, Ge, Meng, Wang, Xiaobao, Wang, Yuguang, Dang, Jianwu, Tashi, Nyima
Self-supervised learning (SSL) has garnered significant attention in speech processing, excelling in linguistic tasks such as speech recognition. However, jointly improving the performance of pre-trained models on various downstream tasks, each requi
Externí odkaz:
http://arxiv.org/abs/2409.00387
Autor:
Qiang, Chunyu, Geng, Wang, Zhao, Yi, Fu, Ruibo, Wang, Tao, Gong, Cheng, Wang, Tianrui, Liu, Qiuyu, Yi, Jiangyan, Wen, Zhengqi, Zhang, Chen, Che, Hao, Wang, Longbiao, Dang, Jianwu, Tao, Jianhua
Deep learning has brought significant improvements to the field of cross-modal representation learning. For tasks such as text-to-speech (TTS), voice conversion (VC), and automatic speech recognition (ASR), a cross-modal fine-grained (frame-level) se
Externí odkaz:
http://arxiv.org/abs/2408.05758
Accurately finding the wrong words in the automatic speech recognition (ASR) hypothesis and recovering them well-founded is the goal of speech error correction. In this paper, we propose a non-autoregressive speech error correction method. A Confiden
Externí odkaz:
http://arxiv.org/abs/2407.12817
Autor:
Gong, Cheng, Cooper, Erica, Wang, Xin, Qiang, Chunyu, Geng, Mengzhe, Wells, Dan, Wang, Longbiao, Dang, Jianwu, Tessier, Marc, Pine, Aidan, Richmond, Korin, Yamagishi, Junichi
Self-supervised learning (SSL) representations from massively multilingual models offer a promising solution for low-resource language speech tasks. Despite advancements, language adaptation in TTS systems remains an open problem. This paper explores
Externí odkaz:
http://arxiv.org/abs/2406.08911
Emotion Recognition in Conversations (ERC) is a popular task in natural language processing, which aims to recognize the emotional state of the speaker in conversations. While current research primarily emphasizes contextual modeling, there exists a
Externí odkaz:
http://arxiv.org/abs/2407.00743
Autor:
Wang, He, Guo, Pengcheng, Li, Yue, Zhang, Ao, Sun, Jiayao, Xie, Lei, Chen, Wei, Zhou, Pan, Bu, Hui, Xu, Xin, Zhang, Binbin, Chen, Zhuo, Wu, Jian, Wang, Longbiao, Chng, Eng Siong, Li, Sun
To promote speech processing and recognition research in driving scenarios, we build on the success of the Intelligent Cockpit Speech Recognition Challenge (ICSRC) held at ISCSLP 2022 and launch the ICASSP 2024 In-Car Multi-Channel Automatic Speech R
Externí odkaz:
http://arxiv.org/abs/2401.03473
Autor:
Gong, Cheng, Wang, Xin, Cooper, Erica, Wells, Dan, Wang, Longbiao, Dang, Jianwu, Richmond, Korin, Yamagishi, Junichi
Neural text-to-speech (TTS) has achieved human-like synthetic speech for single-speaker, single-language synthesis. Multilingual TTS systems are limited to resource-rich languages due to the lack of large paired text and studio-quality audio data. TT
Externí odkaz:
http://arxiv.org/abs/2312.14398
Speech emotion recognition (SER) performance deteriorates significantly in the presence of noise, making it challenging to achieve competitive performance in noisy conditions. To this end, we propose a multi-level knowledge distillation (MLKD) method
Externí odkaz:
http://arxiv.org/abs/2312.13556
Supervised speech enhancement has gained significantly from recent advancements in neural networks, especially due to their ability to non-linearly fit the diverse representations of target speech, such as waveform or spectrum. However, these direct-
Externí odkaz:
http://arxiv.org/abs/2312.11201
Text-to-speech (TTS) methods have shown promising results in voice cloning, but they require a large number of labeled text-speech pairs. Minimally-supervised speech synthesis decouples TTS by combining two types of discrete speech representations(se
Externí odkaz:
http://arxiv.org/abs/2309.15512