Zobrazeno 1 - 10
of 98
pro vyhledávání: '"Grósz, Tamás"'
Large pre-trained models are essential in paralinguistic systems, demonstrating effectiveness in tasks like emotion recognition and stuttering detection. In this paper, we employ large pre-trained models for the ACM Multimedia Computational Paralingu
Externí odkaz:
http://arxiv.org/abs/2310.10179
Traditional topic identification solutions from audio rely on an automatic speech recognition system (ASR) to produce transcripts used as input to a text-based model. These approaches work well in high-resource scenarios, where there are sufficient d
Externí odkaz:
http://arxiv.org/abs/2307.11450
The events of recent years have highlighted the importance of telemedicine solutions which could potentially allow remote treatment and diagnosis. Relatedly, Computational Paralinguistics, a unique subfield of Speech Processing, aims to extract infor
Externí odkaz:
http://arxiv.org/abs/2210.15978
It is common knowledge that the quantity and quality of the training data play a significant role in the creation of a good machine learning model. In this paper, we take it one step further and demonstrate that the way the training examples are arra
Externí odkaz:
http://arxiv.org/abs/2208.05782
Autor:
Moisio, Anssi, Porjazovski, Dejan, Rouhe, Aku, Getman, Yaroslav, Virkkunen, Anja, Grósz, Tamás, Lindén, Krister, Kurimo, Mikko
The Donate Speech campaign has so far succeeded in gathering approximately 3600 hours of ordinary, colloquial Finnish speech into the Lahjoita puhetta (Donate Speech) corpus. The corpus includes over twenty thousand speakers from all the regions of F
Externí odkaz:
http://arxiv.org/abs/2203.12906
Publikováno v:
In Speech Communication September 2024 163
This paper describes AaltoASR's speech recognition system for the INTERSPEECH 2020 shared task on Automatic Speech Recognition (ASR) for non-native children's speech. The task is to recognize non-native speech from children of various age groups give
Externí odkaz:
http://arxiv.org/abs/2008.12914
End-to-end neural network models (E2E) have shown significant performance benefits on different INTERSPEECH ComParE tasks. Prior work has applied either a single instance of an E2E model for a task or the same E2E architecture for different tasks. Ho
Externí odkaz:
http://arxiv.org/abs/2008.02689
Autor:
Csapó, Tamás Gábor, Al-Radhi, Mohammed Salah, Németh, Géza, Gosztolya, Gábor, Grósz, Tamás, Tóth, László, Markó, Alexandra
Recently it was shown that within the Silent Speech Interface (SSI) field, the prediction of F0 is possible from Ultrasound Tongue Images (UTI) as the articulatory input, using Deep Neural Networks for articulatory-to-acoustic mapping. Moreover, text
Externí odkaz:
http://arxiv.org/abs/1906.09885
Autor:
Gosztolya, Gábor, Pintér, Ádám, Tóth, László, Grósz, Tamás, Markó, Alexandra, Csapó, Tamás Gábor
When using ultrasound video as input, Deep Neural Network-based Silent Speech Interfaces usually rely on the whole image to estimate the spectral parameters required for the speech synthesis step. Although this approach is quite straightforward, and
Externí odkaz:
http://arxiv.org/abs/1904.05259