Zobrazeno 1 - 10
of 164
pro vyhledávání: '"Liu, Xunying"'
Autor:
Geng, Mengzhe, Xie, Xurong, Deng, Jiajun, Jin, Zengrui, Li, Guinan, Wang, Tianzi, Hu, Shujie, Li, Zhaoqing, Meng, Helen, Liu, Xunying
The application of data-intensive automatic speech recognition (ASR) technologies to dysarthric and elderly adult speech is confronted by their mismatch against healthy and nonaged voices, data scarcity and large speaker-level variability. To this en
Externí odkaz:
http://arxiv.org/abs/2407.06310
Autor:
Yang, Yifan, Song, Zheshu, Zhuo, Jianheng, Cui, Mingyu, Li, Jinpeng, Yang, Bo, Du, Yexing, Ma, Ziyang, Liu, Xunying, Wang, Ziyuan, Li, Ke, Fan, Shuai, Yu, Kai, Zhang, Wei-Qiang, Chen, Guoguo, Chen, Xie
The evolution of speech technology has been spurred by the rapid increase in dataset sizes. Traditional speech models generally depend on a large amount of labeled training data, which is scarce for low-resource languages. This paper presents GigaSpe
Externí odkaz:
http://arxiv.org/abs/2406.11546
Autor:
Li, Zhaoqing, Xu, Haoning, Wang, Tianzi, Hu, Shoukang, Jin, Zengrui, Hu, Shujie, Deng, Jiajun, Cui, Mingyu, Geng, Mengzhe, Liu, Xunying
We propose a novel one-pass multiple ASR systems joint compression and quantization approach using an all-in-one neural model. A single compression cycle allows multiple nested systems with varying Encoder depths, widths, and quantization precision s
Externí odkaz:
http://arxiv.org/abs/2406.10160
Autor:
Li, Guinan, Deng, Jiajun, Chen, Youjun, Geng, Mengzhe, Hu, Shujie, Li, Zhe, Jin, Zengrui, Wang, Tianzi, Xie, Xurong, Meng, Helen, Liu, Xunying
This paper proposes joint speaker feature learning methods for zero-shot adaptation of audio-visual multichannel speech separation and recognition systems. xVector and ECAPA-TDNN speaker encoders are connected using purpose-built fusion blocks and ti
Externí odkaz:
http://arxiv.org/abs/2406.10152
Autor:
Wang, Tianzi, Xie, Xurong, Li, Zhaoqing, Hu, Shoukang, Jing, Zengrui, Deng, Jiajun, Cui, Mingyu, Hu, Shujie, Geng, Mengzhe, Li, Guinan, Meng, Helen, Liu, Xunying
This paper proposes a novel non-autoregressive (NAR) block-based Attention Mask Decoder (AMD) that flexibly balances performance-efficiency trade-offs for Conformer ASR systems. AMD performs parallel NAR inference within contiguous blocks of output l
Externí odkaz:
http://arxiv.org/abs/2406.10034
Autor:
Jiang, Yicong, Wang, Tianzi, Xie, Xurong, Liu, Juan, Sun, Wei, Yan, Nan, Chen, Hui, Wang, Lan, Liu, Xunying, Tian, Feng
Disordered speech recognition profound implications for improving the quality of life for individuals afflicted with, for example, dysarthria. Dysarthric speech recognition encounters challenges including limited data, substantial dissimilarities bet
Externí odkaz:
http://arxiv.org/abs/2406.09873
Autor:
Hu, Shujie, Zhou, Long, Liu, Shujie, Chen, Sanyuan, Hao, Hongkun, Pan, Jing, Liu, Xunying, Li, Jinyu, Sivasankaran, Sunit, Liu, Linquan, Wei, Furu
The recent advancements in large language models (LLMs) have revolutionized the field of natural language processing, progressively broadening their scope to multimodal perception and generation. However, effectively integrating listening capabilitie
Externí odkaz:
http://arxiv.org/abs/2404.00656
Autor:
Chen, Xueyuan, Wang, Yuejiao, Wu, Xixin, Wang, Disong, Wu, Zhiyong, Liu, Xunying, Meng, Helen
Dysarthric speech reconstruction (DSR) aims to transform dysarthric speech into normal speech by improving the intelligibility and naturalness. This is a challenging task especially for patients with severe dysarthria and speaking in complex, noisy a
Externí odkaz:
http://arxiv.org/abs/2401.17796
End-to-end multi-talker speech recognition has garnered great interest as an effective approach to directly transcribe overlapped speech from multiple speakers. Current methods typically adopt either 1) single-input multiple-output (SIMO) models with
Externí odkaz:
http://arxiv.org/abs/2401.04152
Autor:
Wang, Huimeng, Jin, Zengrui, Geng, Mengzhe, Hu, Shujie, Li, Guinan, Wang, Tianzi, Xu, Haoning, Liu, Xunying
Automatic recognition of dysarthric speech remains a highly challenging task to date. Neuro-motor conditions and co-occurring physical disabilities create difficulty in large-scale data collection for ASR system development. Adapting SSL pre-trained
Externí odkaz:
http://arxiv.org/abs/2401.00662