Zobrazeno 1 - 10
of 72
pro vyhledávání: '"Li, Guinan"'
Autor:
Geng, Mengzhe, Xie, Xurong, Deng, Jiajun, Jin, Zengrui, Li, Guinan, Wang, Tianzi, Hu, Shujie, Li, Zhaoqing, Meng, Helen, Liu, Xunying
The application of data-intensive automatic speech recognition (ASR) technologies to dysarthric and elderly adult speech is confronted by their mismatch against healthy and nonaged voices, data scarcity and large speaker-level variability. To this en
Externí odkaz:
http://arxiv.org/abs/2407.06310
Autor:
Hu, Shujie, Xie, Xurong, Geng, Mengzhe, Jin, Zengrui, Deng, Jiajun, Li, Guinan, Wang, Yi, Cui, Mingyu, Wang, Tianzi, Meng, Helen, Liu, Xunying
Self-supervised learning (SSL) based speech foundation models have been applied to a wide range of ASR tasks. However, their application to dysarthric and elderly speech via data-intensive parameter fine-tuning is confronted by in-domain data scarcit
Externí odkaz:
http://arxiv.org/abs/2407.13782
Autor:
Li, Guinan, Deng, Jiajun, Chen, Youjun, Geng, Mengzhe, Hu, Shujie, Li, Zhe, Jin, Zengrui, Wang, Tianzi, Xie, Xurong, Meng, Helen, Liu, Xunying
This paper proposes joint speaker feature learning methods for zero-shot adaptation of audio-visual multichannel speech separation and recognition systems. xVector and ECAPA-TDNN speaker encoders are connected using purpose-built fusion blocks and ti
Externí odkaz:
http://arxiv.org/abs/2406.10152
Autor:
Wang, Tianzi, Xie, Xurong, Li, Zhaoqing, Hu, Shoukang, Jin, Zengrui, Deng, Jiajun, Cui, Mingyu, Hu, Shujie, Geng, Mengzhe, Li, Guinan, Meng, Helen, Liu, Xunying
This paper proposes a novel non-autoregressive (NAR) block-based Attention Mask Decoder (AMD) that flexibly balances performance-efficiency trade-offs for Conformer ASR systems. AMD performs parallel NAR inference within contiguous blocks of output l
Externí odkaz:
http://arxiv.org/abs/2406.10034
Autor:
Wang, Huimeng, Jin, Zengrui, Geng, Mengzhe, Hu, Shujie, Li, Guinan, Wang, Tianzi, Xu, Haoning, Liu, Xunying
Automatic recognition of dysarthric speech remains a highly challenging task to date. Neuro-motor conditions and co-occurring physical disabilities create difficulty in large-scale data collection for ASR system development. Adapting SSL pre-trained
Externí odkaz:
http://arxiv.org/abs/2401.00662
Autor:
Jin, Zengrui, Xie, Xurong, Wang, Tianzi, Geng, Mengzhe, Deng, Jiajun, Li, Guinan, Hu, Shujie, Liu, Xunying
Automatic recognition of disordered speech remains a highly challenging task to date due to data scarcity. This paper presents a reinforcement learning (RL) based on-the-fly data augmentation approach for training state-of-the-art PyChain TDNN and en
Externí odkaz:
http://arxiv.org/abs/2312.08641
Autor:
Li, Guinan, Deng, Jiajun, Geng, Mengzhe, Jin, Zengrui, Wang, Tianzi, Hu, Shujie, Cui, Mingyu, Meng, Helen, Liu, Xunying
Accurate recognition of cocktail party speech containing overlapping speakers, noise and reverberation remains a highly challenging task to date. Motivated by the invariance of visual modality to acoustic signal corruption, an audio-visual multi-chan
Externí odkaz:
http://arxiv.org/abs/2307.02909
Autor:
Deng, Jiajun, Li, Guinan, Xie, Xurong, Jin, Zengrui, Cui, Mingyu, Wang, Tianzi, Hu, Shujie, Geng, Mengzhe, Liu, Xunying
Rich sources of variability in natural speech present significant challenges to current data intensive speech recognition technologies. To model both speaker and environment level diversity, this paper proposes a novel Bayesian factorised speaker-env
Externí odkaz:
http://arxiv.org/abs/2306.14608
Autor:
Geng, Mengzhe, Jin, Zengrui, Wang, Tianzi, Hu, Shujie, Deng, Jiajun, Cui, Mingyu, Li, Guinan, Yu, Jianwei, Xie, Xurong, Liu, Xunying
A key challenge in dysarthric speech recognition is the speaker-level diversity attributed to both speaker-identity associated factors such as gender, and speech impairment severity. Most prior researches on addressing this issue focused on using spe
Externí odkaz:
http://arxiv.org/abs/2305.10659
Autor:
Deng, Jiajun, Xie, Xurong, Wang, Tianzi, Cui, Mingyu, Xue, Boyang, Jin, Zengrui, Li, Guinan, Hu, Shujie, Liu, Xunying
Speaker adaptation techniques provide a powerful solution to customise automatic speech recognition (ASR) systems for individual users. Practical application of unsupervised model-based speaker adaptation techniques to data intensive end-to-end ASR s
Externí odkaz:
http://arxiv.org/abs/2302.07521