Zobrazeno 1 - 10
of 5 652
pro vyhledávání: '"QIN, Yong"'
Autor:
Zhou, Jiaming, Wang, Shiyao, Zhao, Shiwan, He, Jiabei, Sun, Haoqin, Wang, Hui, Liu, Cheng, Kong, Aobo, Guo, Yujie, Qin, Yong
Automatic speech recognition (ASR) systems have advanced significantly with models like Whisper, Conformer, and self-supervised frameworks such as Wav2vec 2.0 and HuBERT. However, developing robust ASR models for young children's speech remains chall
Externí odkaz:
http://arxiv.org/abs/2409.18584
As text-based speech editing becomes increasingly prevalent, the demand for unrestricted free-text editing continues to grow. However, existing speech editing techniques encounter significant challenges, particularly in maintaining intelligibility an
Externí odkaz:
http://arxiv.org/abs/2409.12992
Diffusion-based text-to-audio (TTA) generation has made substantial progress, leveraging latent diffusion model (LDM) to produce high-quality, diverse and instruction-relevant audios. However, beyond generation, the task of audio editing remains equa
Externí odkaz:
http://arxiv.org/abs/2409.12466
Autor:
Zhou, Jiaming, Zhao, Shiwan, He, Jiabei, Wang, Hui, Zeng, Wenjia, Chen, Yong, Sun, Haoqin, Kong, Aobo, Qin, Yong
State-of-the-art models like OpenAI's Whisper exhibit strong performance in multilingual automatic speech recognition (ASR), but they still face challenges in accurately recognizing diverse subdialects. In this paper, we propose M2R-whisper, a novel
Externí odkaz:
http://arxiv.org/abs/2409.11889
Autor:
Xue, Hongfei, Gong, Rong, Shao, Mingchen, Xu, Xin, Wang, Lezhi, Xie, Lei, Bu, Hui, Zhou, Jiaming, Qin, Yong, Du, Jun, Li, Ming, Zhang, Binbin, Jia, Bin
The StutteringSpeech Challenge focuses on advancing speech technologies for people who stutter, specifically targeting Stuttering Event Detection (SED) and Automatic Speech Recognition (ASR) in Mandarin. The challenge comprises three tracks: (1) SED,
Externí odkaz:
http://arxiv.org/abs/2409.05430
For the SLT 2024 Low-Resource Dysarthria Wake-Up Word Spotting (LRDWWS) Challenge, we introduce the PB-LRDWWS system. This system combines a dysarthric speech content feature extractor for prototype construction with a prototype-based classification
Externí odkaz:
http://arxiv.org/abs/2409.04799
Autor:
Wang, Hui, Zhao, Shiwan, Zhou, Jiaming, Zheng, Xiguang, Sun, Haoqin, Wang, Xuechen, Qin, Yong
Mean Opinion Score (MOS) prediction has made significant progress in specific domains. However, the unstable performance of MOS prediction models across diverse samples presents ongoing challenges in the practical application of these systems. In thi
Externí odkaz:
http://arxiv.org/abs/2408.12829
Recognizing emotions from speech is a daunting task due to the subtlety and ambiguity of expressions. Traditional speech emotion recognition (SER) systems, which typically rely on a singular, precise emotion label, struggle with this complexity. Ther
Externí odkaz:
http://arxiv.org/abs/2408.00325
Publikováno v:
INTERSPEECH 2024
Dysarthric speech recognition (DSR) presents a formidable challenge due to inherent inter-speaker variability, leading to severe performance degradation when applying DSR models to new dysarthric speakers. Traditional speaker adaptation methodologies
Externí odkaz:
http://arxiv.org/abs/2407.18461
Autor:
Sun, Haoqin, Zhao, Shiwan, Li, Shaokai, Kong, Xiangyu, Wang, Xuechen, Kong, Aobo, Zhou, Jiaming, Chen, Yong, Zeng, Wenjia, Qin, Yong
Multimodal emotion recognition systems rely heavily on the full availability of modalities, suffering significant performance declines when modal data is incomplete. To tackle this issue, we present the Cross-Modal Alignment, Reconstruction, and Refi
Externí odkaz:
http://arxiv.org/abs/2407.09029