Zobrazeno 1 - 10
of 4 600
pro vyhledávání: '"Wang Hsin"'
This paper addresses the prevalent issue of incorrect speech output in audio-visual speech enhancement (AVSE) systems, which is often caused by poor video quality and mismatched training and test data. We introduce a post-processing classifier (PPC)
Externí odkaz:
http://arxiv.org/abs/2409.14554
Autor:
Wang, Chien-Chun, Chen, Li-Wei, Chou, Cheng-Kang, Lee, Hung-Shin, Chen, Berlin, Wang, Hsin-Min
While pre-trained automatic speech recognition (ASR) systems demonstrate impressive performance on matched domains, their performance often degrades when confronted with channel mismatch stemming from unseen recording environments and conditions. To
Externí odkaz:
http://arxiv.org/abs/2409.12386
Autor:
Ren, Wenze, Wu, Haibin, Lin, Yi-Cheng, Chen, Xuanjun, Chao, Rong, Hung, Kuo-Hsuan, Li, You-Jin, Ting, Wen-Yuan, Wang, Hsin-Min, Tsao, Yu
In multichannel speech enhancement, effectively capturing spatial and spectral information across different microphones is crucial for noise reduction. Traditional methods, such as CNN or LSTM, attempt to model the temporal dynamics of full-band and
Externí odkaz:
http://arxiv.org/abs/2409.10376
This work investigates two strategies for zero-shot non-intrusive speech assessment leveraging large language models. First, we explore the audio analysis capabilities of GPT-4o. Second, we propose GPT-Whisper, which uses Whisper as an audio-to-text
Externí odkaz:
http://arxiv.org/abs/2409.09914
This study investigates the efficacy of data augmentation techniques for low-resource automatic speech recognition (ASR), focusing on two endangered Austronesian languages, Amis and Seediq. Recognizing the potential of self-supervised learning (SSL)
Externí odkaz:
http://arxiv.org/abs/2409.08872
Autor:
Huang, Wen-Chin, Fu, Szu-Wei, Cooper, Erica, Zezario, Ryandhimas E., Toda, Tomoki, Wang, Hsin-Min, Yamagishi, Junichi, Tsao, Yu
We present the third edition of the VoiceMOS Challenge, a scientific initiative designed to advance research into automatic prediction of human speech ratings. There were three tracks. The first track was on predicting the quality of ``zoomed-in'' hi
Externí odkaz:
http://arxiv.org/abs/2409.07001
End-to-end (E2E) automatic speech recognition (ASR) models have become standard practice for various commercial applications. However, in real-world scenarios, the long-tailed nature of word distribution often leads E2E ASR models to perform well on
Externí odkaz:
http://arxiv.org/abs/2409.06468
Cross-domain speech enhancement (SE) is often faced with severe challenges due to the scarcity of noise and background information in an unseen target domain, leading to a mismatch between training and test conditions. This study puts forward a novel
Externí odkaz:
http://arxiv.org/abs/2409.01545
Autor:
Wang, Hsin-Po, Guruswami, Venkatesan
As a possible implementation of data storage using DNA, multiple strands of DNA are stored in a liquid container so that, in the future, they can be read by an array of DNA readers in parallel. These readers will sample the strands with replacement t
Externí odkaz:
http://arxiv.org/abs/2409.00889
Autor:
Guruswami, Venkatesan, Wang, Hsin-Po
To ensure differential privacy, one can reveal an integer fuzzily in two ways: (a) add some Laplace noise to the integer, or (b) encode the integer as a binary string and add iid BSC noise. The former is simple and natural while the latter is flexibl
Externí odkaz:
http://arxiv.org/abs/2406.17669