Zobrazeno 1 - 10
of 14
pro vyhledávání: '"Wang, Dingdong"'
With the rise of Speech Large Language Models (Speech LLMs), there has been growing interest in discrete speech tokens for their ability to integrate with text-based tokens seamlessly. Compared to most studies that focus on continuous speech features
Externí odkaz:
http://arxiv.org/abs/2411.08742
Autor:
Chen, Kai, Gou, Yunhao, Huang, Runhui, Liu, Zhili, Tan, Daxin, Xu, Jing, Wang, Chunwei, Zhu, Yi, Zeng, Yihan, Yang, Kuo, Wang, Dingdong, Xiang, Kun, Li, Haoyuan, Bai, Haoli, Han, Jianhua, Li, Xiaohui, Jin, Weike, Xie, Nian, Zhang, Yu, Kwok, James T., Zhao, Hengshuang, Liang, Xiaodan, Yeung, Dit-Yan, Chen, Xiao, Li, Zhenguo, Zhang, Wei, Liu, Qun, Yao, Jun, Hong, Lanqing, Hou, Lu, Xu, Hang
GPT-4o, an omni-modal model that enables vocal conversations with diverse emotions and tones, marks a milestone for omni-modal foundation models. However, empowering Large Language Models to perceive and generate images, texts, and speeches end-to-en
Externí odkaz:
http://arxiv.org/abs/2409.18042
Autor:
Cui, Mingyu, Tan, Daxin, Yang, Yifan, Wang, Dingdong, Wang, Huimeng, Chen, Xiao, Chen, Xie, Liu, Xunying
With the advancement of Self-supervised Learning (SSL) in speech-related tasks, there has been growing interest in utilizing discrete tokens generated by SSL for automatic speech recognition (ASR), as they offer faster processing techniques. However,
Externí odkaz:
http://arxiv.org/abs/2409.08805
CoLM-DSR: Leveraging Neural Codec Language Modeling for Multi-Modal Dysarthric Speech Reconstruction
Dysarthric speech reconstruction (DSR) aims to transform dysarthric speech into normal speech. It still suffers from low speaker similarity and poor prosody naturalness. In this paper, we propose a multi-modal DSR model by leveraging neural codec lan
Externí odkaz:
http://arxiv.org/abs/2406.08336
In this study, we propose a simple and efficient Non-Autoregressive (NAR) text-to-speech (TTS) system based on diffusion, named SimpleSpeech. Its simpleness shows in three aspects: (1) It can be trained on the speech-only dataset, without any alignme
Externí odkaz:
http://arxiv.org/abs/2406.02328
Akademický článek
Tento výsledek nelze pro nepřihlášené uživatele zobrazit.
K zobrazení výsledku je třeba se přihlásit.
K zobrazení výsledku je třeba se přihlásit.
Autor:
Zhang, Xiaokang1 (AUTHOR), Wang, Chen1 (AUTHOR), He, Dingdong1,2 (AUTHOR), Cheng, Yating1 (AUTHOR), Yu, Li1 (AUTHOR), Qi, Daoxi1 (AUTHOR), Li, Boyu1 (AUTHOR), Zheng, Fang1 (AUTHOR) zhengfang@whu.edu.cn
Publikováno v:
Clinical Epigenetics. 9/30/2022, Vol. 14 Issue 1, p1-17. 17p.
Publikováno v:
Polymer Chemistry; 4/21/2022, Vol. 13 Issue 15, p2195-2200, 6p
Autor:
Wang, Xiao, Zhang, Dingdong, Jin, Hui, Poliquit, Beta Zenia, Philippa, Bronson, Nagiri, Ravi Chandra Raju, Subbiah, Jegadesan, Jones, David J., Ren, Wencai, Du, Jinhong, Burn, Paul L., Yu, Junsheng
Publikováno v:
Solar RRL; May2019, Vol. 3 Issue 5, pN.PAG-N.PAG, 1p
This two-volume set (CCIS 1879 and 1880) constitutes the refereed proceedings of the 9th International Conference of Pioneering Computer Scientists, Engineers and Educators, ICPCSEE 2023 held in Harbin, China, during September 22–24, 2023.The 52 f