Zobrazeno 1 - 10
of 1 454
pro vyhledávání: '"Gong, Yuan"'
Neural Audio Codecs, initially designed as a compression technique, have gained more attention recently for speech generation. Codec models represent each audio frame as a sequence of tokens, i.e., discrete embeddings. The discrete and low-frequency
Externí odkaz:
http://arxiv.org/abs/2410.22448
Recent advancements in Large Language Models (LLMs) have demonstrated great success in many Natural Language Processing (NLP) tasks. In addition to their cognitive intelligence, exploring their capabilities in emotional intelligence is also crucial,
Externí odkaz:
http://arxiv.org/abs/2409.18339
Annotating and recognizing speech emotion using prompt engineering has recently emerged with the advancement of Large Language Models (LLMs), yet its efficacy and reliability remain questionable. In this paper, we conduct a systematic study on this t
Externí odkaz:
http://arxiv.org/abs/2409.15551
Autor:
Yang, Chao-Han Huck, Park, Taejin, Gong, Yuan, Li, Yuanchao, Chen, Zhehuai, Lin, Yen-Ting, Chen, Chen, Hu, Yuchen, Dhawan, Kunal, Żelasko, Piotr, Zhang, Chao, Chen, Yun-Nung, Tsao, Yu, Balam, Jagadeesh, Ginsburg, Boris, Siniscalchi, Sabato Marco, Chng, Eng Siong, Bell, Peter, Lai, Catherine, Watanabe, Shinji, Stolcke, Andreas
Given recent advances in generative AI technology, a key question is how large language models (LLMs) can enhance acoustic modeling tasks using text decoding results from a frozen, pretrained automatic speech recognition (ASR) model. To explore new c
Externí odkaz:
http://arxiv.org/abs/2409.09785
Autor:
Bhati, Saurabhchand, Gong, Yuan, Karlinsky, Leonid, Kuehne, Hilde, Feris, Rogerio, Glass, James
State-space models (SSMs) have emerged as an alternative to Transformers for audio modeling due to their high computational efficiency with long inputs. While recent efforts on Audio SSMs have reported encouraging results, two main limitations remain
Externí odkaz:
http://arxiv.org/abs/2407.04082
Autor:
Wang, Liming, Gong, Yuan, Dawalatabad, Nauman, Vilela, Marco, Placek, Katerina, Tracey, Brian, Gong, Yishu, Premasiri, Alan, Vieira, Fernando, Glass, James
Automatic prediction of amyotrophic lateral sclerosis (ALS) disease progression provides a more efficient and objective alternative than manual approaches. We propose ALS longitudinal speech transformer (ALST), a neural network-based automatic predic
Externí odkaz:
http://arxiv.org/abs/2406.18625
Autor:
Rouditchenko, Andrew, Gong, Yuan, Thomas, Samuel, Karlinsky, Leonid, Kuehne, Hilde, Feris, Rogerio, Glass, James
Audio-Visual Speech Recognition (AVSR) uses lip-based video to improve performance in noise. Since videos are harder to obtain than audio, the video training data of AVSR models is usually limited to a few thousand hours. In contrast, speech models s
Externí odkaz:
http://arxiv.org/abs/2406.10082
Deep learning models are essential for scene classification, change detection, land cover segmentation, and other remote sensing image understanding tasks. Most backbones of existing remote sensing deep learning models are typically initialized by pr
Externí odkaz:
http://arxiv.org/abs/2401.04614
Humans are surrounded by audio signals that include both speech and non-speech sounds. The recognition and understanding of speech and non-speech audio events, along with a profound comprehension of the relationship between them, constitute fundament
Externí odkaz:
http://arxiv.org/abs/2309.14405
Autor:
Zhang, Tianhua, Ge, Jiaxin, Luo, Hongyin, Chuang, Yung-Sung, Gao, Mingye, Gong, Yuan, Wu, Xixin, Kim, Yoon, Meng, Helen, Glass, James
How can we perform computations over natural language representations to solve tasks that require symbolic and numeric reasoning? We propose natural language embedded programs (NLEP) as a unifying framework for addressing math/symbolic reasoning, nat
Externí odkaz:
http://arxiv.org/abs/2309.10814