Zobrazeno 1 - 10
of 18 023
pro vyhledávání: '"An-Yi Cheng"'
Complex systems can undergo critical transitions, where slowly changing environmental conditions trigger a sudden shift to a new, potentially catastrophic state. Early warning signals for these events are crucial for decision-making in fields such as
Externí odkaz:
http://arxiv.org/abs/2410.09707
In this paper, we introduce a speech-conditioned Large Language Model (LLM) integrated with a Mixture of Experts (MoE) based connector to address the challenge of Code-Switching (CS) in Automatic Speech Recognition (ASR). Specifically, we propose an
Externí odkaz:
http://arxiv.org/abs/2409.15905
Autor:
Wu, Haibin, Chen, Xuanjun, Lin, Yi-Cheng, Chang, Kaiwei, Du, Jiawei, Lu, Ke-Han, Liu, Alexander H., Chung, Ho-Lam, Wu, Yuan-Kuei, Yang, Dongchao, Liu, Songxiang, Wu, Yi-Chiao, Tan, Xu, Glass, James, Watanabe, Shinji, Lee, Hung-yi
Neural audio codec models are becoming increasingly important as they serve as tokenizers for audio, enabling efficient transmission or facilitating speech language modeling. The ideal neural audio codec should maintain content, paralinguistics, spea
Externí odkaz:
http://arxiv.org/abs/2409.14085
Speech Emotion Recognition (SER) is a crucial component in developing general-purpose AI agents capable of natural human-computer interaction. However, building robust multilingual SER systems remains challenging due to the scarcity of labeled data i
Externí odkaz:
http://arxiv.org/abs/2409.10985
Autor:
Ren, Wenze, Wu, Haibin, Lin, Yi-Cheng, Chen, Xuanjun, Chao, Rong, Hung, Kuo-Hsuan, Li, You-Jin, Ting, Wen-Yuan, Wang, Hsin-Min, Tsao, Yu
In multichannel speech enhancement, effectively capturing spatial and spectral information across different microphones is crucial for noise reduction. Traditional methods, such as CNN or LSTM, attempt to model the temporal dynamics of full-band and
Externí odkaz:
http://arxiv.org/abs/2409.10376
Automated speaking assessment in conversation tests (ASAC) aims to evaluate the overall speaking proficiency of an L2 (second-language) speaker in a setting where an interlocutor interacts with one or more candidates. Although prior ASAC approaches h
Externí odkaz:
http://arxiv.org/abs/2409.07064
End-to-end (E2E) automatic speech recognition (ASR) models have become standard practice for various commercial applications. However, in real-world scenarios, the long-tailed nature of word distribution often leads E2E ASR models to perform well on
Externí odkaz:
http://arxiv.org/abs/2409.06468
Despite their impressive success, training foundation models remains computationally costly. This paper investigates how to efficiently train speech foundation models with self-supervised learning (SSL) under a limited compute budget. We examine crit
Externí odkaz:
http://arxiv.org/abs/2409.16295
Autor:
Tan, Derek Ming Siang, Ma, Yixiao, Liang, Jingsong, Chng, Yi Cheng, Cao, Yuhong, Sartoretti, Guillaume
Information sharing is critical in time-sensitive and realistic multi-robot exploration, especially for smaller robotic teams in large-scale environments where connectivity may be sparse and intermittent. Existing methods often overlook such communic
Externí odkaz:
http://arxiv.org/abs/2409.04730
In this article, we develop nonparametric inference methods for comparing survival data across two samples, which are beneficial for clinical trials of novel cancer therapies where long-term survival is a critical outcome. These therapies, including
Externí odkaz:
http://arxiv.org/abs/2409.02209