Zobrazeno 1 - 10
of 537
pro vyhledávání: '"LAI, CATHERINE"'
The lack of labeled data is a common challenge in speech classification tasks, particularly those requiring extensive subjective assessment, such as cognitive state classification. In this work, we propose a Semi-Supervised Learning (SSL) framework,
Externí odkaz:
http://arxiv.org/abs/2409.16937
Annotating and recognizing speech emotion using prompt engineering has recently emerged with the advancement of Large Language Models (LLMs), yet its efficacy and reliability remain questionable. In this paper, we conduct a systematic study on this t
Externí odkaz:
http://arxiv.org/abs/2409.15551
Autor:
Yang, Chao-Han Huck, Park, Taejin, Gong, Yuan, Li, Yuanchao, Chen, Zhehuai, Lin, Yen-Ting, Chen, Chen, Hu, Yuchen, Dhawan, Kunal, Żelasko, Piotr, Zhang, Chao, Chen, Yun-Nung, Tsao, Yu, Balam, Jagadeesh, Ginsburg, Boris, Siniscalchi, Sabato Marco, Chng, Eng Siong, Bell, Peter, Lai, Catherine, Watanabe, Shinji, Stolcke, Andreas
Given recent advances in generative AI technology, a key question is how large language models (LLMs) can enhance acoustic modeling tasks using text decoding results from a frozen, pretrained automatic speech recognition (ASR) model. To explore new c
Externí odkaz:
http://arxiv.org/abs/2409.09785
Text data is commonly utilized as a primary input to enhance Speech Emotion Recognition (SER) performance and reliability. However, the reliance on human-transcribed text in most studies impedes the development of practical SER systems, creating a ga
Externí odkaz:
http://arxiv.org/abs/2406.08353
Autor:
Chen, Mingjie, Zhang, Hezhao, Li, Yuanchao, Luo, Jiachen, Wu, Wen, Ma, Ziyang, Bell, Peter, Lai, Catherine, Reiss, Joshua, Wang, Lin, Woodland, Philip C., Chen, Xie, Phan, Huy, Hain, Thomas
Speech emotion recognition is a challenging classification task with natural emotional speech, especially when the distribution of emotion types is imbalanced in the training and test data. In this case, it is more difficult for a model to learn to s
Externí odkaz:
http://arxiv.org/abs/2405.20064
ASR remains unsatisfactory in scenarios where the speaking style diverges from that used to train ASR systems, resulting in erroneous transcripts. To address this, ASR Error Correction (AEC), a post-ASR processing approach, is required. In this work,
Externí odkaz:
http://arxiv.org/abs/2405.16677
The efficacy of self-supervised speech models has been validated, yet the optimal utilization of their representations remains challenging across diverse tasks. In this study, we delve into Acoustic Word Embeddings (AWEs), a fixed-length feature deri
Externí odkaz:
http://arxiv.org/abs/2402.02617
Speech is a fundamental means of communication that can be seen to provide two channels for transmitting information: the lexical channel of which words are said, and the non-lexical channel of how they are spoken. Both channels shape listener expect
Externí odkaz:
http://arxiv.org/abs/2307.03534
Holistic perception of affective attributes is an important human perceptual ability. However, this ability is far from being realized in current affective computing, as not all of the attributes are well studied and their interrelationships are poor
Externí odkaz:
http://arxiv.org/abs/2305.16076
In Speech Emotion Recognition (SER), textual data is often used alongside audio signals to address their inherent variability. However, the reliance on human annotated text in most research hinders the development of practical SER systems. To overcom
Externí odkaz:
http://arxiv.org/abs/2305.16065