Zobrazeno 1 - 10
of 404
pro vyhledávání: '"Busso, Carlos"'
Speech emotion recognition (SER) systems often struggle in real-world environments, where ambient noise severely degrades their performance. This paper explores a novel approach that exploits prior knowledge of testing environments to maximize SER pe
Externí odkaz:
http://arxiv.org/abs/2407.17716
Cross-lingual speech emotion recognition (SER) is important for a wide range of everyday applications. While recent SER research relies heavily on large pretrained models for emotion training, existing studies often concentrate solely on the final tr
Externí odkaz:
http://arxiv.org/abs/2407.04966
In speech synthesis, modeling of rich emotions and prosodic variations present in human voice are crucial to synthesize natural speech. Although speaker embeddings have been widely used in personalized speech synthesis as conditioning inputs, they ar
Externí odkaz:
http://arxiv.org/abs/2407.04291
Autor:
Salman, Ali N., Du, Zongyang, Chandra, Shreeram Suresh, Ulgen, Ismail Rasim, Busso, Carlos, Sisman, Berrak
Voice conversion (VC) research traditionally depends on scripted or acted speech, which lacks the natural spontaneity of real-life conversations. While natural speech data is limited for VC, our study focuses on filling in this gap. We introduce a no
Externí odkaz:
http://arxiv.org/abs/2406.04494
Autor:
Rajapakshe, Thejan, Rana, Rajib, Khalifa, Sara, Sisman, Berrak, Schuller, Bjorn W., Busso, Carlos
Speech Emotion Recognition (SER) is crucial for enabling computers to understand the emotions conveyed in human communication. With recent advancements in Deep Learning (DL), the performance of SER models has significantly improved. However, designin
Externí odkaz:
http://arxiv.org/abs/2403.14083
Speaker embeddings carry valuable emotion-related information, which makes them a promising resource for enhancing speech emotion recognition (SER), especially with limited labeled data. Traditionally, it has been assumed that emotion information is
Externí odkaz:
http://arxiv.org/abs/2401.11017
Most current audio-visual emotion recognition models lack the flexibility needed for deployment in practical applications. We envision a multimodal system that works even when only one modality is available and can be implemented interchangeably for
Externí odkaz:
http://arxiv.org/abs/2305.07216
Emotional voice conversion (EVC) traditionally targets the transformation of spoken utterances from one emotional state to another, with previous research mainly focusing on discrete emotion categories. This paper departs from the norm by introducing
Externí odkaz:
http://arxiv.org/abs/2210.13756
Anomaly driving detection is an important problem in advanced driver assistance systems (ADAS). It is important to identify potential hazard scenarios as early as possible to avoid potential accidents. This study proposes an unsupervised method to qu
Externí odkaz:
http://arxiv.org/abs/2203.08289
Autor:
Sridhar, Kusha, Busso, Carlos
Publikováno v:
IEEE Transactions on Affective Computing, vol. 13, no. 4, pp. 1959-1972, October-December 2022
The prediction of valence from speech is an important, but challenging problem. The externalization of valence in speech has speaker-dependent cues, which contribute to performances that are often significantly lower than the prediction of other emot
Externí odkaz:
http://arxiv.org/abs/2201.07876