Zobrazeno 1 - 10
of 950
pro vyhledávání: '"Wang, Xinsheng"'
StreamVoice has recently pushed the boundaries of zero-shot voice conversion (VC) in the streaming domain. It uses a streamable language model (LM) with a context-aware approach to convert semantic features from automatic speech recognition (ASR) int
Externí odkaz:
http://arxiv.org/abs/2408.02178
Speaker Change Detection (SCD) is to identify boundaries among speakers in a conversation. Motivated by the success of fine-tuning wav2vec 2.0 models for the SCD task, a further investigation of self-supervised learning (SSL) features for SCD is cond
Externí odkaz:
http://arxiv.org/abs/2406.08393
Recent language model (LM) advancements have showcased impressive zero-shot voice conversion (VC) performance. However, existing LM-based VC models usually apply offline conversion from source semantics to acoustic features, demanding the complete so
Externí odkaz:
http://arxiv.org/abs/2401.11053
In addition to conveying the linguistic content from source speech to converted speech, maintaining the speaking style of source speech also plays an important role in the voice conversion (VC) task, which is essential in many scenarios with highly e
Externí odkaz:
http://arxiv.org/abs/2309.01142
Text-to-speech (TTS) and singing voice synthesis (SVS) aim at generating high-quality speaking and singing voice according to textual input and music scores, respectively. Unifying TTS and SVS into a single system is crucial to the applications requi
Externí odkaz:
http://arxiv.org/abs/2212.01546
Conveying the linguistic content and maintaining the source speech's speaking style, such as intonation and emotion, is essential in voice conversion (VC). However, in a low-resource situation, where only limited utterances from the target speaker ar
Externí odkaz:
http://arxiv.org/abs/2211.08857
Autor:
Wang Xinsheng, Wang Xiuge
Publikováno v:
Zeitschrift für Kristallographie - New Crystal Structures, Vol 239, Iss 3, Pp 473-475 (2024)
C14H20Cl2N4O4Zn, monoclinic, P21/n (no. 14), a = 8.562(2) Å, b = 27.953(8) Å, c = 8.804(2) Å, β = 117.092(4)°, V = 1875.9(9) Å3, Z = 4, R gt(F) = 0.0441, wR ref(F 2) = 0.1031, T = 296 K.
Externí odkaz:
https://doaj.org/article/08aa3e69a3254cf8859a9e5657cd5464
Autor:
Wang Xinsheng, Wang Xiuge
Publikováno v:
Zeitschrift für Kristallographie - New Crystal Structures, Vol 239, Iss 3, Pp 447-449 (2024)
C6H6N2O2, monoclinic P21/c (no. 14), a = 6.7909(2) Å, b = 23.9261(7) Å, c = 7.5103(2) Å, β = 95.265(2)°, V = 1215.12(6) Å3, Z = 8, R gt(F) = 0.0574, wR ref(F 2) = 0.1439, T = 293.
Externí odkaz:
https://doaj.org/article/18d2fd7b19784a3c95504c8d02dbd980
In current two-stage neural text-to-speech (TTS) paradigm, it is ideal to have a universal neural vocoder, once trained, which is robust to imperfect mel-spectrogram predicted from the acoustic model. To this end, we propose Robust MelGAN vocoder by
Externí odkaz:
http://arxiv.org/abs/2210.17349
Cross-speaker emotion transfer speech synthesis aims to synthesize emotional speech for a target speaker by transferring the emotion from reference speech recorded by another (source) speaker. In this task, extracting speaker-independent emotion embe
Externí odkaz:
http://arxiv.org/abs/2207.01198