Výsledky vyhledávání

Report

Homogeneous Speaker Features for On-the-Fly Dysarthric and Elderly Speaker Adaptation

Autor: Geng, Mengzhe, Xie, Xurong, Deng, Jiajun, Jin, Zengrui, Li, Guinan, Wang, Tianzi, Hu, Shujie, Li, Zhaoqing, Meng, Helen, Liu, Xunying

The application of data-intensive automatic speech recognition (ASR) technologies to dysarthric and elderly adult speech is confronted by their mismatch against healthy and nonaged voices, data scarcity and large speaker-level variability. To this en

Externí odkaz: http://arxiv.org/abs/2407.06310

Zobrazit plný text záznamu

Report

Self-supervised ASR Models and Features For Dysarthric and Elderly Speech Recognition

Autor: Hu, Shujie, Xie, Xurong, Geng, Mengzhe, Jin, Zengrui, Deng, Jiajun, Li, Guinan, Wang, Yi, Cui, Mingyu, Wang, Tianzi, Meng, Helen, Liu, Xunying

Self-supervised learning (SSL) based speech foundation models have been applied to a wide range of ASR tasks. However, their application to dysarthric and elderly speech via data-intensive parameter fine-tuning is confronted by in-domain data scarcit

Externí odkaz: http://arxiv.org/abs/2407.13782

Zobrazit plný text záznamu

Report

Joint Speaker Features Learning for Audio-visual Multichannel Speech Separation and Recognition

Autor: Li, Guinan, Deng, Jiajun, Chen, Youjun, Geng, Mengzhe, Hu, Shujie, Li, Zhe, Jin, Zengrui, Wang, Tianzi, Xie, Xurong, Meng, Helen, Liu, Xunying

This paper proposes joint speaker feature learning methods for zero-shot adaptation of audio-visual multichannel speech separation and recognition systems. xVector and ECAPA-TDNN speaker encoders are connected using purpose-built fusion blocks and ti

Externí odkaz: http://arxiv.org/abs/2406.10152

Zobrazit plný text záznamu

Report

Towards Effective and Efficient Non-autoregressive Decoding Using Block-based Attention Mask

Autor: Wang, Tianzi, Xie, Xurong, Li, Zhaoqing, Hu, Shoukang, Jin, Zengrui, Deng, Jiajun, Cui, Mingyu, Hu, Shujie, Geng, Mengzhe, Li, Guinan, Meng, Helen, Liu, Xunying

This paper proposes a novel non-autoregressive (NAR) block-based Attention Mask Decoder (AMD) that flexibly balances performance-efficiency trade-offs for Conformer ASR systems. AMD performs parallel NAR inference within contiguous blocks of output l

Externí odkaz: http://arxiv.org/abs/2406.10034

Zobrazit plný text záznamu

Report

Enhancing Pre-trained ASR System Fine-tuning for Dysarthric Speech Recognition using Adversarial Data Augmentation

Autor: Wang, Huimeng, Jin, Zengrui, Geng, Mengzhe, Hu, Shujie, Li, Guinan, Wang, Tianzi, Xu, Haoning, Liu, Xunying

Automatic recognition of dysarthric speech remains a highly challenging task to date. Neuro-motor conditions and co-occurring physical disabilities create difficulty in large-scale data collection for ASR system development. Adapting SSL pre-trained

Externí odkaz: http://arxiv.org/abs/2401.00662

Zobrazit plný text záznamu

Report

Towards Automatic Data Augmentation for Disordered Speech Recognition

Autor: Jin, Zengrui, Xie, Xurong, Wang, Tianzi, Geng, Mengzhe, Deng, Jiajun, Li, Guinan, Hu, Shujie, Liu, Xunying

Automatic recognition of disordered speech remains a highly challenging task to date due to data scarcity. This paper presents a reinforcement learning (RL) based on-the-fly data augmentation approach for training state-of-the-art PyChain TDNN and en

Externí odkaz: http://arxiv.org/abs/2312.08641

Zobrazit plný text záznamu

Report

Audio-visual End-to-end Multi-channel Speech Separation, Dereverberation and Recognition

Autor: Li, Guinan, Deng, Jiajun, Geng, Mengzhe, Jin, Zengrui, Wang, Tianzi, Hu, Shujie, Cui, Mingyu, Meng, Helen, Liu, Xunying

Accurate recognition of cocktail party speech containing overlapping speakers, noise and reverberation remains a highly challenging task to date. Motivated by the invariance of visual modality to acoustic signal corruption, an audio-visual multi-chan

Externí odkaz: http://arxiv.org/abs/2307.02909

Zobrazit plný text záznamu

Report

Factorised Speaker-environment Adaptive Training of Conformer Speech Recognition Systems

Autor: Deng, Jiajun, Li, Guinan, Xie, Xurong, Jin, Zengrui, Cui, Mingyu, Wang, Tianzi, Hu, Shujie, Geng, Mengzhe, Liu, Xunying

Rich sources of variability in natural speech present significant challenges to current data intensive speech recognition technologies. To model both speaker and environment level diversity, this paper proposes a novel Bayesian factorised speaker-env

Externí odkaz: http://arxiv.org/abs/2306.14608

Zobrazit plný text záznamu

Report

Use of Speech Impairment Severity for Dysarthric Speech Recognition

Autor: Geng, Mengzhe, Jin, Zengrui, Wang, Tianzi, Hu, Shujie, Deng, Jiajun, Cui, Mingyu, Li, Guinan, Yu, Jianwei, Xie, Xurong, Liu, Xunying

A key challenge in dysarthric speech recognition is the speaker-level diversity attributed to both speaker-identity associated factors such as gender, and speech impairment severity. Most prior researches on addressing this issue focused on using spe

Externí odkaz: http://arxiv.org/abs/2305.10659

Zobrazit plný text záznamu

Report

Confidence Score Based Speaker Adaptation of Conformer Speech Recognition Systems

Autor: Deng, Jiajun, Xie, Xurong, Wang, Tianzi, Cui, Mingyu, Xue, Boyang, Jin, Zengrui, Li, Guinan, Hu, Shujie, Liu, Xunying

Speaker adaptation techniques provide a powerful solution to customise automatic speech recognition (ASR) systems for individual users. Practical application of unsupervised model-based speaker adaptation techniques to data intensive end-to-end ASR s

Externí odkaz: http://arxiv.org/abs/2302.07521

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání