Zobrazeno 1 - 10
of 828
pro vyhledávání: '"Eyben, P."'
Increasingly frequent publications in the literature report voice quality differences between depressed patients and controls. Here, we examine the possibility of using voice analysis as an early warning signal for the development of emotion disturba
Externí odkaz:
http://arxiv.org/abs/2411.11541
Autor:
Kounadis-Bastian, Dionyssos, Schrüfer, Oliver, Derington, Anna, Wierstorf, Hagen, Eyben, Florian, Burkhardt, Felix, Schuller, Björn
Speech Emotion Recognition (SER) needs high computational resources to overcome the challenge of substantial annotator disagreement. Today SER is shifting towards dimensional annotations of arousal, dominance, and valence (A/D/V). Universal metrics a
Externí odkaz:
http://arxiv.org/abs/2408.13920
Uncertainty Quantification (UQ) is an important building block for the reliable use of neural networks in real-world scenarios, as it can be a useful tool in identifying faulty predictions. Speech emotion recognition (SER) models can suffer from part
Externí odkaz:
http://arxiv.org/abs/2407.01143
Autor:
Derington, Anna, Wierstorf, Hagen, Özkil, Ali, Eyben, Florian, Burkhardt, Felix, Schuller, Björn W.
Machine learning models for speech emotion recognition (SER) can be trained for different tasks and are usually evaluated on the basis of a few available datasets per task. Tasks could include arousal, valence, dominance, emotional categories, or ton
Externí odkaz:
http://arxiv.org/abs/2312.06270
We introduce two rule-based models to modify the prosody of speech synthesis in order to modulate the emotion to be expressed. The prosody modulation is based on speech synthesis markup language (SSML) and can be used with any commercial speech synth
Externí odkaz:
http://arxiv.org/abs/2307.02132
We report on the curation of several publicly available datasets for age and gender prediction. Furthermore, we present experiments to predict age and gender with models based on a pre-trained wav2vec 2.0. Depending on the dataset, we achieve an MAE
Externí odkaz:
http://arxiv.org/abs/2306.16962
Driven by the need for larger and more diverse datasets to pre-train and fine-tune increasingly complex machine learning models, the number of datasets is rapidly growing. audb is an open-source Python library that supports versioning and documentati
Externí odkaz:
http://arxiv.org/abs/2303.00645
Autor:
Triantafyllopoulos, Andreas, Wagner, Johannes, Wierstorf, Hagen, Schmitt, Maximilian, Reichel, Uwe, Eyben, Florian, Burkhardt, Felix, Schuller, Björn W.
Publikováno v:
Proc. Interspeech 2022, 146-150
Large, pre-trained neural networks consisting of self-attention layers (transformers) have recently achieved state-of-the-art results on several speech emotion recognition (SER) datasets. These models are typically pre-trained in self-supervised mann
Externí odkaz:
http://arxiv.org/abs/2204.00400
Autor:
Wagner, Johannes, Triantafyllopoulos, Andreas, Wierstorf, Hagen, Schmitt, Maximilian, Burkhardt, Felix, Eyben, Florian, Schuller, Björn W.
Publikováno v:
in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 9, pp. 10745-10759, 1 Sept. 2023
Recent advances in transformer-based architectures which are pre-trained in self-supervised manner have shown great promise in several machine learning tasks. In the audio domain, such architectures have also been successfully utilised in the field o
Externí odkaz:
http://arxiv.org/abs/2203.07378
Autor:
Triantafyllopoulos, Andreas, Reichel, Uwe, Liu, Shuo, Huber, Stephan, Eyben, Florian, Schuller, Björn W.
Publikováno v:
Frontiers in Computer Science, Volume 5, 2023
In this contribution, we investigate the effectiveness of deep fusion of text and audio features for categorical and dimensional speech emotion recognition (SER). We propose a novel, multistage fusion method where the two information streams are inte
Externí odkaz:
http://arxiv.org/abs/2110.06650