Výsledky vyhledávání - "Wang, Xinsheng"

Report

StreamVoice+: Evolving into End-to-end Streaming Zero-shot Voice Conversion

Autor: Wang, Zhichao, Chen, Yuanzhe, Wang, Xinsheng, Xie, Lei, Wang, Yuping

StreamVoice has recently pushed the boundaries of zero-shot voice conversion (VC) in the streaming domain. It uses a streamable language model (LM) with a context-aware approach to convert semantic features from automatic speech recognition (ASR) int

Externí odkaz: http://arxiv.org/abs/2408.02178

Zobrazit plný text záznamu

Report

SCDNet: Self-supervised Learning Feature-based Speaker Change Detection

Autor: Li, Yue, Wang, Xinsheng, Zhang, Li, Xie, Lei

Speaker Change Detection (SCD) is to identify boundaries among speakers in a conversation. Motivated by the success of fine-tuning wav2vec 2.0 models for the SCD task, a further investigation of self-supervised learning (SSL) features for SCD is cond

Externí odkaz: http://arxiv.org/abs/2406.08393

Zobrazit plný text záznamu

Report

StreamVoice: Streamable Context-Aware Language Modeling for Real-time Zero-Shot Voice Conversion

Autor: Wang, Zhichao, Chen, Yuanzhe, Wang, Xinsheng, Xie, Lei, Wang, Yuping

Recent language model (LM) advancements have showcased impressive zero-shot voice conversion (VC) performance. However, existing LM-based VC models usually apply offline conversion from source semantics to acoustic features, demanding the complete so

Externí odkaz: http://arxiv.org/abs/2401.11053

Zobrazit plný text záznamu

Report

MSM-VC: High-fidelity Source Style Transfer for Non-Parallel Voice Conversion by Multi-scale Style Modeling

Autor: Wang, Zhichao, Wang, Xinsheng, Xie, Qicong, Li, Tao, Xie, Lei, Tian, Qiao, Wang, Yuping

In addition to conveying the linguistic content from source speech to converted speech, maintaining the speaking style of source speech also plays an important role in the voice conversion (VC) task, which is essential in many scenarios with highly e

Externí odkaz: http://arxiv.org/abs/2309.01142

Zobrazit plný text záznamu

Report

UniSyn: An End-to-End Unified Model for Text-to-Speech and Singing Voice Synthesis

Autor: Lei, Yi, Yang, Shan, Wang, Xinsheng, Xie, Qicong, Yao, Jixun, Xie, Lei, Su, Dan

Text-to-speech (TTS) and singing voice synthesis (SVS) aim at generating high-quality speaking and singing voice according to textual input and music scores, respectively. Unifying TTS and SVS into a single system is crucial to the applications requi

Externí odkaz: http://arxiv.org/abs/2212.01546

Zobrazit plný text záznamu

Report

Delivering Speaking Style in Low-resource Voice Conversion with Multi-factor Constraints

Autor: Wang, Zhichao, Wang, Xinsheng, Xie, Lei, Chen, Yuanzhe, Tian, Qiao, Wang, Yuping

Conveying the linguistic content and maintaining the source speech's speaking style, such as intonation and emotion, is essential in voice conversion (VC). However, in a low-resource situation, where only limited utterances from the target speaker ar

Externí odkaz: http://arxiv.org/abs/2211.08857

Zobrazit plný text záznamu

Akademický článek

The crystal structure of dichlorido-bis(3-methyl-3-imidazolium-1-ylpropionato-κ2 O,O′)-zinc(II), C14H20Cl2N4O4Zn

Autor: Wang Xinsheng, Wang Xiuge

Publikováno v: Zeitschrift für Kristallographie - New Crystal Structures, Vol 239, Iss 3, Pp 473-475 (2024)

C14H20Cl2N4O4Zn, monoclinic, P21/n (no. 14), a = 8.562(2) Å, b = 27.953(8) Å, c = 8.804(2) Å, β = 117.092(4)°, V = 1875.9(9) Å3, Z = 4, R gt(F) = 0.0441, wR ref(F 2) = 0.1031, T = 296 K.

Externí odkaz: https://doaj.org/article/08aa3e69a3254cf8859a9e5657cd5464

Zobrazit plný text záznamu

Akademický článek

The crystal structure of zwitterionic 3-aminoisonicotinic acid, C6H6N2O2

Autor: Wang Xinsheng, Wang Xiuge

Publikováno v: Zeitschrift für Kristallographie - New Crystal Structures, Vol 239, Iss 3, Pp 447-449 (2024)

C6H6N2O2, monoclinic P21/c (no. 14), a = 6.7909(2) Å, b = 23.9261(7) Å, c = 7.5103(2) Å, β = 95.265(2)°, V = 1215.12(6) Å3, Z = 8, R gt(F) = 0.0574, wR ref(F 2) = 0.1439, T = 293.

Externí odkaz: https://doaj.org/article/18d2fd7b19784a3c95504c8d02dbd980

Zobrazit plný text záznamu

Report

Robust MelGAN: A robust universal neural vocoder for high-fidelity TTS

Autor: Song, Kun, Cong, Jian, Wang, Xinsheng, Zhang, Yongmao, Xie, Lei, Jiang, Ning, Wu, Haiying

In current two-stage neural text-to-speech (TTS) paradigm, it is ideal to have a universal neural vocoder, once trained, which is robust to imperfect mel-spectrogram predicted from the acoustic model. To this end, we propose Robust MelGAN vocoder by

Externí odkaz: http://arxiv.org/abs/2210.17349

Zobrazit plný text záznamu

Report

Cross-speaker Emotion Transfer Based On Prosody Compensation for End-to-End Speech Synthesis

Autor: Li, Tao, Wang, Xinsheng, Xie, Qicong, Wang, Zhichao, Jiang, Mingqi, Xie, Lei

Cross-speaker emotion transfer speech synthesis aims to synthesize emotional speech for a target speaker by transferring the emotion from reference speech recorded by another (source) speaker. In this task, extracting speaker-independent emotion embe

Externí odkaz: http://arxiv.org/abs/2207.01198

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání