Zobrazeno 1 - 10
of 138
pro vyhledávání: '"Woodland, P. C."'
Publikováno v:
Interspeech 2024
Speech-based automatic detection of Alzheimer's disease (AD) and depression has attracted increased attention. Confidence estimation is crucial for a trust-worthy automatic diagnostic system which informs the clinician about the confidence of model p
Externí odkaz:
http://arxiv.org/abs/2407.19984
This paper introduces a novel approach to speaker-attributed ASR transcription using a neural clustering method. With a parallel processing mechanism, diarisation and ASR can be applied simultaneously, helping to prevent the accumulation of errors fr
Externí odkaz:
http://arxiv.org/abs/2407.02007
Autor:
Deng, Keqi, Woodland, Philip C.
While the neural transducer is popular for online speech recognition, simultaneous speech translation (SST) requires both streaming and re-ordering capabilities. This paper presents the LS-Transducer-SST, a label-synchronous neural transducer for SST
Externí odkaz:
http://arxiv.org/abs/2406.04541
Wav2Prompt is proposed which allows straightforward integration between spoken input and a text-based large language model (LLM). Wav2Prompt uses a simple training process with only the same data used to train an automatic speech recognition (ASR) mo
Externí odkaz:
http://arxiv.org/abs/2406.00522
Autor:
Chen, Mingjie, Zhang, Hezhao, Li, Yuanchao, Luo, Jiachen, Wu, Wen, Ma, Ziyang, Bell, Peter, Lai, Catherine, Reiss, Joshua, Wang, Lin, Woodland, Philip C., Chen, Xie, Phan, Huy, Hain, Thomas
Speech emotion recognition is a challenging classification task with natural emotional speech, especially when the distribution of emotion types is imbalanced in the training and test data. In this case, it is more difficult for a model to learn to s
Externí odkaz:
http://arxiv.org/abs/2405.20064
Autor:
Wu, Wen, Li, Bo, Zhang, Chao, Chiu, Chung-Cheng, Li, Qiujia, Bai, Junwen, Sainath, Tara N., Woodland, Philip C.
Publikováno v:
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) 2024
The subjective perception of emotion leads to inconsistent labels from human annotators. Typically, utterances lacking majority-agreed labels are excluded when training an emotion classifier, which cause problems when encountering ambiguous emotional
Externí odkaz:
http://arxiv.org/abs/2402.12862
Publikováno v:
ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Seoul, Korea, Republic of, 2024, pp. 10986-10990
Foundation models have shown superior performance for speech emotion recognition (SER). However, given the limited data in emotion corpora, finetuning all parameters of large pre-trained models for SER can be both resource-intensive and susceptible t
Externí odkaz:
http://arxiv.org/abs/2402.11747
Autor:
Deng, Keqi, Woodland, Philip C.
Recently, connectionist temporal classification (CTC)-based end-to-end (E2E) automatic speech recognition (ASR) models have achieved impressive results, especially with the development of self-supervised learning. However, E2E ASR models trained on p
Externí odkaz:
http://arxiv.org/abs/2312.09100
Autor:
Deng, Keqi, Woodland, Philip C.
Although end-to-end (E2E) automatic speech recognition (ASR) has shown state-of-the-art recognition accuracy, it tends to be implicitly biased towards the training data distribution which can degrade generalisation. This paper proposes a label-synchr
Externí odkaz:
http://arxiv.org/abs/2311.11353
Autor:
Sun, Guangzhi, Feng, Shutong, Jiang, Dongcheng, Zhang, Chao, Gašić, Milica, Woodland, Philip C.
Recently, advancements in large language models (LLMs) have shown an unprecedented ability across various language tasks. This paper investigates the potential application of LLMs to slot filling with noisy ASR transcriptions, via both in-context lea
Externí odkaz:
http://arxiv.org/abs/2311.07418