Zobrazeno 1 - 10
of 100
pro vyhledávání: '"Cooper, Erica"'
Autor:
Huang, Wen-Chin, Fu, Szu-Wei, Cooper, Erica, Zezario, Ryandhimas E., Toda, Tomoki, Wang, Hsin-Min, Yamagishi, Junichi, Tsao, Yu
We present the third edition of the VoiceMOS Challenge, a scientific initiative designed to advance research into automatic prediction of human speech ratings. There were three tracks. The first track was on predicting the quality of ``zoomed-in'' hi
Externí odkaz:
http://arxiv.org/abs/2409.07001
In real-world applications, it is challenging to build a speaker verification system that is simultaneously robust against common threats, including spoofing attacks, channel mismatch, and domain mismatch. Traditional automatic speaker verification (
Externí odkaz:
http://arxiv.org/abs/2409.06327
Autor:
Gong, Cheng, Cooper, Erica, Wang, Xin, Qiang, Chunyu, Geng, Mengzhe, Wells, Dan, Wang, Longbiao, Dang, Jianwu, Tessier, Marc, Pine, Aidan, Richmond, Korin, Yamagishi, Junichi
Self-supervised learning (SSL) representations from massively multilingual models offer a promising solution for low-resource language speech tasks. Despite advancements, language adaptation in TTS systems remains an open problem. This paper explores
Externí odkaz:
http://arxiv.org/abs/2406.08911
This paper proposes a speech synthesis system that allows users to specify and control the acoustic characteristics of a speaker by means of prompts describing the speaker's traits of synthesized speech. Unlike previous approaches, our method utilize
Externí odkaz:
http://arxiv.org/abs/2406.08812
Autor:
Zhang, Lin, Wang, Xin, Cooper, Erica, Diez, Mireia, Landini, Federico, Evans, Nicholas, Yamagishi, Junichi
This paper defines Spoof Diarization as a novel task in the Partial Spoof (PS) scenario. It aims to determine what spoofed when, which includes not only locating spoof regions but also clustering them according to different spoofing methods. As a pio
Externí odkaz:
http://arxiv.org/abs/2406.07816
Predicting audio quality in voice synthesis and conversion systems is a critical yet challenging task, especially when traditional methods like Mean Opinion Scores (MOS) are cumbersome to collect at scale. This paper addresses the gap in efficient au
Externí odkaz:
http://arxiv.org/abs/2312.15616
Autor:
Gong, Cheng, Wang, Xin, Cooper, Erica, Wells, Dan, Wang, Longbiao, Dang, Jianwu, Richmond, Korin, Yamagishi, Junichi
Neural text-to-speech (TTS) has achieved human-like synthetic speech for single-speaker, single-language synthesis. Multilingual TTS systems are limited to resource-rich languages due to the lack of large paired text and studio-quality audio data. TT
Externí odkaz:
http://arxiv.org/abs/2312.14398
In this study, we introduce a novel cross-modal retrieval task involving speaker descriptions and their corresponding audio samples. Utilizing pre-trained speaker and text encoders, we present a simple learning framework based on contrastive learning
Externí odkaz:
http://arxiv.org/abs/2312.06055
This paper introduces a novel objective function for quality mean opinion score (MOS) prediction of unseen speech synthesis systems. The proposed function measures the similarity of relative positions of predicted MOS values, in a mini-batch, rather
Externí odkaz:
http://arxiv.org/abs/2310.05078
We present the second edition of the VoiceMOS Challenge, a scientific event that aims to promote the study of automatic prediction of the mean opinion score (MOS) of synthesized and processed speech. This year, we emphasize real-world and challenging
Externí odkaz:
http://arxiv.org/abs/2310.02640