Zobrazeno 1 - 6
of 6
pro vyhledávání: '"Wang, Dingdong"'
With the rise of Speech Large Language Models (Speech LLMs), there has been growing interest in discrete speech tokens for their ability to integrate with text-based tokens seamlessly. Compared to most studies that focus on continuous speech features
Externí odkaz:
http://arxiv.org/abs/2411.08742
Autor:
Chen, Kai, Gou, Yunhao, Huang, Runhui, Liu, Zhili, Tan, Daxin, Xu, Jing, Wang, Chunwei, Zhu, Yi, Zeng, Yihan, Yang, Kuo, Wang, Dingdong, Xiang, Kun, Li, Haoyuan, Bai, Haoli, Han, Jianhua, Li, Xiaohui, Jin, Weike, Xie, Nian, Zhang, Yu, Kwok, James T., Zhao, Hengshuang, Liang, Xiaodan, Yeung, Dit-Yan, Chen, Xiao, Li, Zhenguo, Zhang, Wei, Liu, Qun, Yao, Jun, Hong, Lanqing, Hou, Lu, Xu, Hang
GPT-4o, an omni-modal model that enables vocal conversations with diverse emotions and tones, marks a milestone for omni-modal foundation models. However, empowering Large Language Models to perceive and generate images, texts, and speeches end-to-en
Externí odkaz:
http://arxiv.org/abs/2409.18042
Autor:
Cui, Mingyu, Tan, Daxin, Yang, Yifan, Wang, Dingdong, Wang, Huimeng, Chen, Xiao, Chen, Xie, Liu, Xunying
With the advancement of Self-supervised Learning (SSL) in speech-related tasks, there has been growing interest in utilizing discrete tokens generated by SSL for automatic speech recognition (ASR), as they offer faster processing techniques. However,
Externí odkaz:
http://arxiv.org/abs/2409.08805
CoLM-DSR: Leveraging Neural Codec Language Modeling for Multi-Modal Dysarthric Speech Reconstruction
Dysarthric speech reconstruction (DSR) aims to transform dysarthric speech into normal speech. It still suffers from low speaker similarity and poor prosody naturalness. In this paper, we propose a multi-modal DSR model by leveraging neural codec lan
Externí odkaz:
http://arxiv.org/abs/2406.08336
In this study, we propose a simple and efficient Non-Autoregressive (NAR) text-to-speech (TTS) system based on diffusion, named SimpleSpeech. Its simpleness shows in three aspects: (1) It can be trained on the speech-only dataset, without any alignme
Externí odkaz:
http://arxiv.org/abs/2406.02328
Akademický článek
Tento výsledek nelze pro nepřihlášené uživatele zobrazit.
K zobrazení výsledku je třeba se přihlásit.
K zobrazení výsledku je třeba se přihlásit.