Zobrazeno 1 - 10
of 914
pro vyhledávání: '"Zhang, ShiLei"'
Human speech exhibits rich and flexible prosodic variations. To address the one-to-many mapping problem from text to prosody in a reasonable and flexible manner, we propose DiffStyleTTS, a multi-speaker acoustic model based on a conditional diffusion
Externí odkaz:
http://arxiv.org/abs/2412.03388
Autor:
Zhang, Chenhao, Wu, Yang, Chen, Jingyi, Jin, Haonan, Wang, Jinghui, Fan, Raymond, Steadman, Paul, van der Laan, Gerrit, Hesjedal, Thorsten, Zhang, Shilei
We performed a pump-probe experiment on the chiral magnet Cu$_2$OSeO$_3$ to study the relaxation dynamics of its non-collinear magnetic orders, employing a millisecond magnetic field pulse as the pump and resonant elastic x-ray scattering as the prob
Externí odkaz:
http://arxiv.org/abs/2410.05485
In this paper, we provide a large audio-visual speaker recognition dataset, VoxBlink2, which includes approximately 10M utterances with videos from 110K+ speakers in the wild. This dataset represents a significant expansion over the VoxBlink dataset,
Externí odkaz:
http://arxiv.org/abs/2407.11510
The diverse nature of dialects presents challenges for models trained on specific linguistic patterns, rendering them susceptible to errors when confronted with unseen or out-of-distribution (OOD) data. This study introduces a novel margin-enhanced j
Externí odkaz:
http://arxiv.org/abs/2406.18067
For speech classification tasks, deep learning models often achieve high accuracy but exhibit shortcomings in calibration, manifesting as classifiers exhibiting overconfidence. The significance of calibration lies in its critical role in guaranteeing
Externí odkaz:
http://arxiv.org/abs/2406.18065
Autor:
Shen, Yao, Gao, Yingying, Hao, Yaqian, Hu, Chenguang, Zhang, Fulin, Feng, Junlan, Zhang, Shilei
Noisy labels are inevitable, even in well-annotated datasets. The detection of noisy labels is of significant importance to enhance the robustness of speaker recognition models. In this paper, we propose a novel noisy label detection approach based o
Externí odkaz:
http://arxiv.org/abs/2406.13268
Pre-trained speech language models such as HuBERT and WavLM leverage unlabeled speech data for self-supervised learning and offer powerful representations for numerous downstream tasks. Despite the success of these models, their high requirements for
Externí odkaz:
http://arxiv.org/abs/2406.09444
Autor:
Yang, Runyan, Yang, Huibao, Zhang, Xiqing, Ye, Tiantian, Liu, Ying, Gao, Yingying, Zhang, Shilei, Deng, Chao, Feng, Junlan
Recently, there have been attempts to integrate various speech processing tasks into a unified model. However, few previous works directly demonstrated that joint optimization of diverse tasks in multitask speech models has positive influence on the
Externí odkaz:
http://arxiv.org/abs/2406.07801
The expectation to deploy a universal neural network for speech enhancement, with the aim of improving noise robustness across diverse speech processing tasks, faces challenges due to the existing lack of awareness within static speech enhancement fr
Externí odkaz:
http://arxiv.org/abs/2402.12746