Zobrazeno 1 - 10
of 2 336
pro vyhledávání: '"P, Woodland"'
Large Language Models (LLMs) are increasingly used to assess NLP tasks due to their ability to generate human-like judgments. Single LLMs were used initially, however, recent work suggests using multiple LLMs as judges yields improved performance. An
Externí odkaz:
http://arxiv.org/abs/2410.10215
With the advances in deep learning, the performance of end-to-end (E2E) single-task models for speech and audio processing has been constantly improving. However, it is still challenging to build a general-purpose model with high performance on multi
Externí odkaz:
http://arxiv.org/abs/2409.17010
Autor:
Woodland, McKell, Patel, Nihil, Castelo, Austin, Taie, Mais Al, Eltaher, Mohamed, Yung, Joshua P., Netherton, Tucker J., Calderone, Tiffany L., Sanchez, Jessica I., Cleere, Darrel W., Elsaiey, Ahmed, Gupta, Nakul, Victor, David, Beretta, Laura, Patel, Ankit B., Brock, Kristy K.
Publikováno v:
Machine.Learning.for.Biomedical.Imaging. 2 (2024) 2006
Clinically deployed deep learning-based segmentation models are known to fail on data outside of their training distributions. While clinicians review the segmentations, these models tend to perform well in most instances, which could exacerbate auto
Externí odkaz:
http://arxiv.org/abs/2408.02761
Publikováno v:
Interspeech 2024
Speech-based automatic detection of Alzheimer's disease (AD) and depression has attracted increased attention. Confidence estimation is crucial for a trust-worthy automatic diagnostic system which informs the clinician about the confidence of model p
Externí odkaz:
http://arxiv.org/abs/2407.19984
This paper introduces a novel approach to speaker-attributed ASR transcription using a neural clustering method. With a parallel processing mechanism, diarisation and ASR can be applied simultaneously, helping to prevent the accumulation of errors fr
Externí odkaz:
http://arxiv.org/abs/2407.02007
Approximate Natural Gradient Descent (NGD) methods are an important family of optimisers for deep learning models, which use approximate Fisher information matrices to pre-condition gradients during training. The empirical Fisher (EF) method approxim
Externí odkaz:
http://arxiv.org/abs/2406.06420
Autor:
Deng, Keqi, Woodland, Philip C.
While the neural transducer is popular for online speech recognition, simultaneous speech translation (SST) requires both streaming and re-ordering capabilities. This paper presents the LS-Transducer-SST, a label-synchronous neural transducer for SST
Externí odkaz:
http://arxiv.org/abs/2406.04541
Wav2Prompt is proposed which allows straightforward integration between spoken input and a text-based large language model (LLM). Wav2Prompt uses a simple training process with only the same data used to train an automatic speech recognition (ASR) mo
Externí odkaz:
http://arxiv.org/abs/2406.00522
Autor:
Chen, Mingjie, Zhang, Hezhao, Li, Yuanchao, Luo, Jiachen, Wu, Wen, Ma, Ziyang, Bell, Peter, Lai, Catherine, Reiss, Joshua, Wang, Lin, Woodland, Philip C., Chen, Xie, Phan, Huy, Hain, Thomas
Speech emotion recognition is a challenging classification task with natural emotional speech, especially when the distribution of emotion types is imbalanced in the training and test data. In this case, it is more difficult for a model to learn to s
Externí odkaz:
http://arxiv.org/abs/2405.20064
Autor:
Sun, Guangzhi, Manakul, Potsawee, Liusie, Adian, Pipatanakul, Kunat, Zhang, Chao, Woodland, Phil, Gales, Mark
Multimodal foundation models are prone to hallucination, generating outputs that either contradict the input or are not grounded by factual information. Given the diversity in architectures, training data and instruction tuning techniques, there can
Externí odkaz:
http://arxiv.org/abs/2405.13684