Zobrazeno 1 - 10
of 10 037
pro vyhledávání: '"DAKE A"'
Autor:
Bu, Dake, Huang, Wei, Han, Andi, Nitanda, Atsushi, Suzuki, Taiji, Zhang, Qingfu, Wong, Hau-San
Transformer-based large language models (LLMs) have displayed remarkable creative prowess and emergence capabilities. Existing empirical studies have revealed a strong connection between these LLMs' impressive emergence abilities and their in-context
Externí odkaz:
http://arxiv.org/abs/2411.02199
Autor:
Xia, Kangxiang, Guo, Dake, Yao, Jixun, Xue, Liumeng, Li, Hanzhao, Wang, Shuai, Guo, Zhao, Xie, Lei, Zhang, Qingqing, Luo, Lei, Dong, Minghui, Sun, Peng
The ISCSLP 2024 Conversational Voice Clone (CoVoC) Challenge aims to benchmark and advance zero-shot spontaneous style voice cloning, particularly focusing on generating spontaneous behaviors in conversational speech. The challenge comprises two trac
Externí odkaz:
http://arxiv.org/abs/2411.00064
Autor:
Guo, Dake, Yao, Jixun, Zhu, Xinfa, Xia, Kangxiang, Guo, Zhao, Zhang, Ziyu, Wang, Yao, Liu, Jie, Xie, Lei
This paper presents the NPU-HWC system submitted to the ISCSLP 2024 Inspirational and Convincing Audio Generation Challenge 2024 (ICAGC). Our system consists of two modules: a speech generator for Track 1 and a background audio generator for Track 2.
Externí odkaz:
http://arxiv.org/abs/2410.23815
Autor:
Yao, Jixun, Kuzmin, Nikita, Wang, Qing, Guo, Pengcheng, Ning, Ziqian, Guo, Dake, Lee, Kong Aik, Chng, Eng-Siong, Xie, Lei
Speaker anonymization is an effective privacy protection solution that conceals the speaker's identity while preserving the linguistic content and paralinguistic information of the original speech. To establish a fair benchmark and facilitate compari
Externí odkaz:
http://arxiv.org/abs/2409.04173
The long speech sequence has been troubling language models (LM) based TTS approaches in terms of modeling complexity and efficiency. This work proposes SoCodec, a semantic-ordered multi-stream speech codec, to address this issue. It compresses speec
Externí odkaz:
http://arxiv.org/abs/2409.00933
Autor:
Zhou, Dake
We show that the existence of massive neutron stars and asymptotic freedom of QCD place robust upper bounds on the lowest sound speed of the ultra-dense matter unattainable in neutron stars. Our approach does not rely on explicitly representing the e
Externí odkaz:
http://arxiv.org/abs/2408.16738
This paper investigates multi-objective reinforcement learning (MORL), which focuses on learning Pareto optimal policies in the presence of multiple reward functions. Despite MORL's significant empirical success, there is still a lack of satisfactory
Externí odkaz:
http://arxiv.org/abs/2407.17466
We study risk-sensitive reinforcement learning (RL), a crucial field due to its ability to enhance decision-making in scenarios where it is essential to manage uncertainty and minimize potential adverse outcomes. Particularly, our work focuses on app
Externí odkaz:
http://arxiv.org/abs/2407.07631
Autor:
Ma, Linhan, Guo, Dake, Song, Kun, Jiang, Yuepeng, Wang, Shuai, Xue, Liumeng, Xu, Weiming, Zhao, Huan, Zhang, Binbin, Xie, Lei
With the development of large text-to-speech (TTS) models and scale-up of the training data, state-of-the-art TTS systems have achieved impressive performance. In this paper, we present WenetSpeech4TTS, a multi-domain Mandarin corpus derived from the
Externí odkaz:
http://arxiv.org/abs/2406.05763
Recent advances in text-to-speech have significantly improved the expressiveness of synthetic speech. However, a major challenge remains in generating speech that captures the diverse styles exhibited by professional narrators in audiobooks without r
Externí odkaz:
http://arxiv.org/abs/2406.05672