Výsledky vyhledávání

Report

Homogeneous Speaker Features for On-the-Fly Dysarthric and Elderly Speaker Adaptation

Autor: Geng, Mengzhe, Xie, Xurong, Deng, Jiajun, Jin, Zengrui, Li, Guinan, Wang, Tianzi, Hu, Shujie, Li, Zhaoqing, Meng, Helen, Liu, Xunying

The application of data-intensive automatic speech recognition (ASR) technologies to dysarthric and elderly adult speech is confronted by their mismatch against healthy and nonaged voices, data scarcity and large speaker-level variability. To this en

Externí odkaz: http://arxiv.org/abs/2407.06310

Zobrazit plný text záznamu

Report

Self-supervised ASR Models and Features For Dysarthric and Elderly Speech Recognition

Autor: Hu, Shujie, Xie, Xurong, Geng, Mengzhe, Jin, Zengrui, Deng, Jiajun, Li, Guinan, Wang, Yi, Cui, Mingyu, Wang, Tianzi, Meng, Helen, Liu, Xunying

Self-supervised learning (SSL) based speech foundation models have been applied to a wide range of ASR tasks. However, their application to dysarthric and elderly speech via data-intensive parameter fine-tuning is confronted by in-domain data scarcit

Externí odkaz: http://arxiv.org/abs/2407.13782

Zobrazit plný text záznamu

Report

Joint Speaker Features Learning for Audio-visual Multichannel Speech Separation and Recognition

Autor: Li, Guinan, Deng, Jiajun, Chen, Youjun, Geng, Mengzhe, Hu, Shujie, Li, Zhe, Jin, Zengrui, Wang, Tianzi, Xie, Xurong, Meng, Helen, Liu, Xunying

This paper proposes joint speaker feature learning methods for zero-shot adaptation of audio-visual multichannel speech separation and recognition systems. xVector and ECAPA-TDNN speaker encoders are connected using purpose-built fusion blocks and ti

Externí odkaz: http://arxiv.org/abs/2406.10152

Zobrazit plný text záznamu

Report

Towards Effective and Efficient Non-autoregressive Decoding Using Block-based Attention Mask

Autor: Wang, Tianzi, Xie, Xurong, Li, Zhaoqing, Hu, Shoukang, Jin, Zengrui, Deng, Jiajun, Cui, Mingyu, Hu, Shujie, Geng, Mengzhe, Li, Guinan, Meng, Helen, Liu, Xunying

This paper proposes a novel non-autoregressive (NAR) block-based Attention Mask Decoder (AMD) that flexibly balances performance-efficiency trade-offs for Conformer ASR systems. AMD performs parallel NAR inference within contiguous blocks of output l

Externí odkaz: http://arxiv.org/abs/2406.10034

Zobrazit plný text záznamu

Report

Perceiver-Prompt: Flexible Speaker Adaptation in Whisper for Chinese Disordered Speech Recognition

Autor: Jiang, Yicong, Wang, Tianzi, Xie, Xurong, Liu, Juan, Sun, Wei, Yan, Nan, Chen, Hui, Wang, Lan, Liu, Xunying, Tian, Feng

Disordered speech recognition profound implications for improving the quality of life for individuals afflicted with, for example, dysarthria. Dysarthric speech recognition encounters challenges including limited data, substantial dissimilarities bet

Externí odkaz: http://arxiv.org/abs/2406.09873

Zobrazit plný text záznamu

Report

Towards Automatic Data Augmentation for Disordered Speech Recognition

Autor: Jin, Zengrui, Xie, Xurong, Wang, Tianzi, Geng, Mengzhe, Deng, Jiajun, Li, Guinan, Hu, Shujie, Liu, Xunying

Automatic recognition of disordered speech remains a highly challenging task to date due to data scarcity. This paper presents a reinforcement learning (RL) based on-the-fly data augmentation approach for training state-of-the-art PyChain TDNN and en

Externí odkaz: http://arxiv.org/abs/2312.08641

Zobrazit plný text záznamu

Report

Factorised Speaker-environment Adaptive Training of Conformer Speech Recognition Systems

Autor: Deng, Jiajun, Li, Guinan, Xie, Xurong, Jin, Zengrui, Cui, Mingyu, Wang, Tianzi, Hu, Shujie, Geng, Mengzhe, Liu, Xunying

Rich sources of variability in natural speech present significant challenges to current data intensive speech recognition technologies. To model both speaker and environment level diversity, this paper proposes a novel Bayesian factorised speaker-env

Externí odkaz: http://arxiv.org/abs/2306.14608

Zobrazit plný text záznamu

Report

Use of Speech Impairment Severity for Dysarthric Speech Recognition

Autor: Geng, Mengzhe, Jin, Zengrui, Wang, Tianzi, Hu, Shujie, Deng, Jiajun, Cui, Mingyu, Li, Guinan, Yu, Jianwei, Xie, Xurong, Liu, Xunying

A key challenge in dysarthric speech recognition is the speaker-level diversity attributed to both speaker-identity associated factors such as gender, and speech impairment severity. Most prior researches on addressing this issue focused on using spe

Externí odkaz: http://arxiv.org/abs/2305.10659

Zobrazit plný text záznamu

Report

Exploring Self-supervised Pre-trained ASR Models For Dysarthric and Elderly Speech Recognition

Autor: Hu, Shujie, Xie, Xurong, Jin, Zengrui, Geng, Mengzhe, Wang, Yi, Cui, Mingyu, Deng, Jiajun, Liu, Xunying, Meng, Helen

Automatic recognition of disordered and elderly speech remains a highly challenging task to date due to the difficulty in collecting such data in large quantities. This paper explores a series of approaches to integrate domain adapted SSL pre-trained

Externí odkaz: http://arxiv.org/abs/2302.14564

Zobrazit plný text záznamu

Report

Confidence Score Based Speaker Adaptation of Conformer Speech Recognition Systems

Autor: Deng, Jiajun, Xie, Xurong, Wang, Tianzi, Cui, Mingyu, Xue, Boyang, Jin, Zengrui, Li, Guinan, Hu, Shujie, Liu, Xunying

Speaker adaptation techniques provide a powerful solution to customise automatic speech recognition (ASR) systems for individual users. Practical application of unsupervised model-based speaker adaptation techniques to data intensive end-to-end ASR s

Externí odkaz: http://arxiv.org/abs/2302.07521

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání