The VoiceMOS Challenge 2024: Beyond Speech Quality Prediction

Autor:	Huang, Wen-Chin, Fu, Szu-Wei, Cooper, Erica, Zezario, Ryandhimas E., Toda, Tomoki, Wang, Hsin-Min, Yamagishi, Junichi, Tsao, Yu
Rok vydání:	2024
Předmět:	Computer Science - Sound Electrical Engineering and Systems Science - Audio and Speech Processing
Druh dokumentu:	Working Paper
Popis:	We present the third edition of the VoiceMOS Challenge, a scientific initiative designed to advance research into automatic prediction of human speech ratings. There were three tracks. The first track was on predicting the quality of ``zoomed-in'' high-quality samples from speech synthesis systems. The second track was to predict ratings of samples from singing voice synthesis and voice conversion with a large variety of systems, listeners, and languages. The third track was semi-supervised quality prediction for noisy, clean, and enhanced speech, where a very small amount of labeled training data was provided. Among the eight teams from both academia and industry, we found that many were able to outperform the baseline systems. Successful techniques included retrieval-based methods and the use of non-self-supervised representations like spectrograms and pitch histograms. These results showed that the challenge has advanced the field of subjective speech rating prediction. Comment: Accepted to SLT2024
Databáze:	arXiv
Externí odkaz:	http://arxiv.org/abs/2409.07001 Zobrazit plný text záznamu View this record from Arxiv