Zobrazeno 1 - 10
of 433
pro vyhledávání: '"Zheng, Thomas"'
Autor:
Zhao, Yiyang, Wang, Shuai, Sun, Guangzhi, Chen, Zehua, Zhang, Chao, Xu, Mingxing, Zheng, Thomas Fang
In this paper, Whisper, a large-scale pre-trained model for automatic speech recognition, is proposed to apply to speaker verification. A partial multi-scale feature aggregation (PMFA) approach is proposed based on a subset of Whisper encoder blocks
Externí odkaz:
http://arxiv.org/abs/2408.15585
Publikováno v:
Interspeech2024
Automatic Speaker Verification (ASV) suffers from performance degradation in noisy conditions. To address this issue, we propose a novel adversarial learning framework that incorporates noise-disentanglement to establish a noise-independent speaker i
Externí odkaz:
http://arxiv.org/abs/2408.11562
End-to-end models have shown superior performance for automatic speech recognition (ASR). However, such models are often very large in size and thus challenging to deploy on resource-constrained edge devices. While quantisation can reduce model sizes
Externí odkaz:
http://arxiv.org/abs/2408.03979
Mixture-of-experts (MoE) models have achieved excellent results in many tasks. However, conventional MoE models are often very large, making them challenging to deploy on resource-constrained edge devices. In this paper, we propose a novel speaker ad
Externí odkaz:
http://arxiv.org/abs/2406.19706
Recent end-to-end automatic speech recognition (ASR) models have become increasingly larger, making them particularly challenging to be deployed on resource-constrained devices. Model quantisation is an effective solution that sometimes causes the wo
Externí odkaz:
http://arxiv.org/abs/2309.09136
Autor:
Urbanus, Malene L1 (AUTHOR), Zheng, Thomas M1 (AUTHOR), Khusnutdinova, Anna N2,3 (AUTHOR), Banh, Doreen1 (AUTHOR), Mount, Harley O'Connor4 (AUTHOR), Gupta, Alind4 (AUTHOR), Stogios, Peter J2 (AUTHOR), Savchenko, Alexei2,5 (AUTHOR), Isberg, Ralph R6 (AUTHOR), Yakunin, Alexander F2,3 (AUTHOR), Ensminger, Alexander W1,4 (AUTHOR) alex.ensminger@utoronto.ca
Publikováno v:
G3: Genes | Genomes | Genetics. Sep2024, Vol. 14 Issue 9, p1-12. 12p.
The way that humans encode their emotion into speech signals is complex. For instance, an angry man may increase his pitch and speaking rate, and use impolite words. In this paper, we present a preliminary study on various emotional factors and inves
Externí odkaz:
http://arxiv.org/abs/2111.12324
The choice of an optimal time-frequency resolution is usually a difficult but important step in tasks involving speech signal classification, e.g., speech anti-spoofing. The variations of the performance with different choices of timefrequency resolu
Externí odkaz:
http://arxiv.org/abs/2110.05087
Autor:
Zhang, Weiyi, Zhao, Shuning, Liu, Le, Li, Jianmin, Cheng, Xingliang, Zheng, Thomas Fang, Hu, Xiaolin
In authentication scenarios, applications of practical speaker verification systems usually require a person to read a dynamic authentication text. Previous studies played an audio adversarial example as a digital signal to perform physical attacks,
Externí odkaz:
http://arxiv.org/abs/2105.09022