Zobrazeno 1 - 10
of 45
pro vyhledávání: '"Lv, YuanJun"'
Building upon advancements in Large Language Models (LLMs), the field of audio processing has seen increased interest in training audio generation tasks with discrete audio token sequences. However, directly discretizing audio by neural audio codecs
Externí odkaz:
http://arxiv.org/abs/2409.19283
Autor:
Chu, Yunfei, Xu, Jin, Yang, Qian, Wei, Haojie, Wei, Xipin, Guo, Zhifang, Leng, Yichong, Lv, Yuanjun, He, Jinzheng, Lin, Junyang, Zhou, Chang, Zhou, Jingren
We introduce the latest progress of Qwen-Audio, a large-scale audio-language model called Qwen2-Audio, which is capable of accepting various audio signal inputs and performing audio analysis or direct textual responses with regard to speech instructi
Externí odkaz:
http://arxiv.org/abs/2407.10759
Autor:
Ma, Linhan, Zhu, Xinfa, Lv, Yuanjun, Wang, Zhichao, Wang, Ziqian, He, Wendi, Zhou, Hongbin, Xie, Lei
Zero-shot voice conversion (VC) aims to transform source speech into arbitrary unseen target voice while keeping the linguistic content unchanged. Recent VC methods have made significant progress, but semantic losses in the decoupling process as well
Externí odkaz:
http://arxiv.org/abs/2406.09844
Vocoders reconstruct speech waveforms from acoustic features and play a pivotal role in modern TTS systems. Frequent-domain GAN vocoders like Vocos and APNet2 have recently seen rapid advancements, outperforming time-domain models in inference speed
Externí odkaz:
http://arxiv.org/abs/2406.08196
Autor:
Liu, Mingshuai, Chen, Zhuangqi, Yan, Xiaopeng, Lv, Yuanjun, Xia, Xianjun, Huang, Chuanzeng, Xiao, Yijian, Xie, Lei
In real-time speech communication systems, speech signals are often degraded by multiple distortions. Recently, a two-stage Repair-and-Denoising network (RaD-Net) was proposed with superior speech quality improvement in the ICASSP 2024 Speech Signal
Externí odkaz:
http://arxiv.org/abs/2406.07498
Autor:
Li, Hanzhao, Xue, Liumeng, Guo, Haohan, Zhu, Xinfa, Lv, Yuanjun, Xie, Lei, Chen, Yunlin, Yin, Hao, Li, Zhifei
The multi-codebook speech codec enables the application of large language models (LLM) in TTS but bottlenecks efficiency and robustness due to multi-sequence prediction. To avoid this obstacle, we propose Single-Codec, a single-codebook single-sequen
Externí odkaz:
http://arxiv.org/abs/2406.07422
Autor:
Yang, Qian, Xu, Jin, Liu, Wenrui, Chu, Yunfei, Jiang, Ziyue, Zhou, Xiaohuan, Leng, Yichong, Lv, Yuanjun, Zhao, Zhou, Zhou, Chang, Zhou, Jingren
Recently, instruction-following audio-language models have received broad attention for human-audio interaction. However, the absence of benchmarks capable of evaluating audio-centric interaction capabilities has impeded advancements in this field. P
Externí odkaz:
http://arxiv.org/abs/2402.07729
Autor:
Liu, Mingshuai, Chen, Zhuangqi, Yan, Xiaopeng, Lv, Yuanjun, Xia, Xianjun, Huang, Chuanzeng, Xiao, Yijian, Xie, Lei
This paper introduces our repairing and denoising network (RaD-Net) for the ICASSP 2024 Speech Signal Improvement (SSI) Challenge. We extend our previous framework based on a two-stage network and propose an upgraded model. Specifically, we replace t
Externí odkaz:
http://arxiv.org/abs/2401.04389
Language models (LMs) have shown superior performances in various speech generation tasks recently, demonstrating their powerful ability for semantic context modeling. Given the intrinsic similarity between speech generation and speech enhancement, h
Externí odkaz:
http://arxiv.org/abs/2312.09747
Language models (LMs) have recently flourished in natural language processing and computer vision, generating high-fidelity texts or images in various tasks. In contrast, the current speech generative models are still struggling regarding speech qual
Externí odkaz:
http://arxiv.org/abs/2310.07246