Zobrazeno 1 - 10
of 20
pro vyhledávání: '"Zezario, Ryandhimas E."'
This work investigates two strategies for zero-shot non-intrusive speech assessment leveraging large language models. First, we explore the audio analysis capabilities of GPT-4o. Second, we propose GPT-Whisper, which uses Whisper as an audio-to-text
Externí odkaz:
http://arxiv.org/abs/2409.09914
Autor:
Huang, Wen-Chin, Fu, Szu-Wei, Cooper, Erica, Zezario, Ryandhimas E., Toda, Tomoki, Wang, Hsin-Min, Yamagishi, Junichi, Tsao, Yu
We present the third edition of the VoiceMOS Challenge, a scientific initiative designed to advance research into automatic prediction of human speech ratings. There were three tracks. The first track was on predicting the quality of ``zoomed-in'' hi
Externí odkaz:
http://arxiv.org/abs/2409.07001
This paper introduces HAAQI-Net, a non-intrusive deep learning model for music audio quality assessment tailored for hearing aid users. Unlike traditional methods like the Hearing Aid Audio Quality Index (HAAQI), which rely on intrusive comparisons t
Externí odkaz:
http://arxiv.org/abs/2401.01145
Autor:
Zezario, Ryandhimas E., Chen, Yu-Wen, Fu, Szu-Wei, Tsao, Yu, Wang, Hsin-Min, Fuh, Chiou-Shann
This research introduces an enhanced version of the multi-objective speech assessment model--MOSA-Net+, by leveraging the acoustic features from Whisper, a large-scaled weakly supervised model. We first investigate the effectiveness of Whisper in dep
Externí odkaz:
http://arxiv.org/abs/2309.12766
Automated speech intelligibility assessment is pivotal for hearing aid (HA) development. In this paper, we present three novel methods to improve intelligibility prediction accuracy and introduce MBI-Net+, an enhanced version of MBI-Net, the top-perf
Externí odkaz:
http://arxiv.org/abs/2309.09548
This study proposes a multi-task pseudo-label learning (MPL)-based non-intrusive speech quality assessment model called MTQ-Net. MPL consists of two stages: obtaining pseudo-label scores from a pretrained model and performing multi-task learning. The
Externí odkaz:
http://arxiv.org/abs/2308.09262
Recently, deep learning (DL)-based non-intrusive speech assessment models have attracted great attention. Many studies report that these DL-based models yield satisfactory assessment performance and good flexibility, but their performance in unseen e
Externí odkaz:
http://arxiv.org/abs/2204.03310
Improving the user's hearing ability to understand speech in noisy environments is critical to the development of hearing aid (HA) devices. For this, it is important to derive a metric that can fairly predict speech intelligibility for HA users. A st
Externí odkaz:
http://arxiv.org/abs/2204.03305
Deep Learning-based Non-Intrusive Multi-Objective Speech Assessment Model with Cross-Domain Features
In this study, we propose a cross-domain multi-objective speech assessment model called MOSA-Net, which can estimate multiple speech assessment metrics simultaneously. Experimental results show that MOSA-Net can improve the linear correlation coeffic
Externí odkaz:
http://arxiv.org/abs/2111.02363
Recent research on speech enhancement (SE) has seen the emergence of deep-learning-based methods. It is still a challenging task to determine the effective ways to increase the generalizability of SE under diverse test conditions. In this study, we c
Externí odkaz:
http://arxiv.org/abs/2012.09359