Zobrazeno 1 - 10
of 347
pro vyhledávání: '"ZHANG Wei-qiang"'
Machine anomalous sound detection (ASD) has emerged as one of the most promising applications in the Industrial Internet of Things (IIoT) due to its unprecedented efficacy in mitigating risks of malfunctions and promoting production efficiency. Previ
Externí odkaz:
http://arxiv.org/abs/2408.14753
Whisper and other large-scale automatic speech recognition models have made significant progress in performance. However, their performance on many low-resource languages, such as Kazakh, is not satisfactory. It is worth researching how to utilize lo
Externí odkaz:
http://arxiv.org/abs/2408.05554
Autor:
Jiang, Anbai, Han, Bing, Lv, Zhiqiang, Deng, Yufeng, Zhang, Wei-Qiang, Chen, Xie, Qian, Yanmin, Liu, Jia, Fan, Pingyi
Large pre-trained models have demonstrated dominant performances in multiple areas, where the consistency between pre-training and fine-tuning is the key to success. However, few works reported satisfactory results of pre-trained models for the machi
Externí odkaz:
http://arxiv.org/abs/2406.11364
Autor:
Yang, Yifan, Song, Zheshu, Zhuo, Jianheng, Cui, Mingyu, Li, Jinpeng, Yang, Bo, Du, Yexing, Ma, Ziyang, Liu, Xunying, Wang, Ziyuan, Li, Ke, Fan, Shuai, Yu, Kai, Zhang, Wei-Qiang, Chen, Guoguo, Chen, Xie
The evolution of speech technology has been spurred by the rapid increase in dataset sizes. Traditional speech models generally depend on a large amount of labeled training data, which is scarce for low-resource languages. This paper presents GigaSpe
Externí odkaz:
http://arxiv.org/abs/2406.11546
As a robust and large-scale multilingual speech recognition model, Whisper has demonstrated impressive results in many low-resource and out-of-distribution scenarios. However, its encoder-decoder structure hinders its application to streaming speech
Externí odkaz:
http://arxiv.org/abs/2406.10052
In the wake of the surging tide of deep learning over the past decade, Automatic Speech Recognition (ASR) has garnered substantial attention, leading to the emergence of numerous publicly accessible ASR systems that are actively being integrated into
Externí odkaz:
http://arxiv.org/abs/2403.08196
The detection of Alzheimer's disease (AD) from spontaneous speech has attracted increasing attention while the sparsity of training data remains an important issue. This paper handles the issue by knowledge transfer, specifically from both speech-gen
Externí odkaz:
http://arxiv.org/abs/2310.04358
Self-supervised pre-trained models such as Wav2vec2, Hubert, and WavLM have been shown to significantly improve many speech tasks. However, their large memory and strong computational requirements hinder their industrial applicability. Structured pru
Externí odkaz:
http://arxiv.org/abs/2306.01385
Multilingual self-supervised speech representation models have greatly enhanced the speech recognition performance for low-resource languages, and the compression of these huge models has also become a crucial prerequisite for their industrial applic
Externí odkaz:
http://arxiv.org/abs/2306.01303
Publikováno v:
IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 31, 2023
The end-to-end speech translation (E2E-ST) model has gradually become a mainstream paradigm due to its low latency and less error propagation. However, it is non-trivial to train such a model well due to the task complexity and data scarcity. The spe
Externí odkaz:
http://arxiv.org/abs/2304.10309