Zobrazeno 1 - 10
of 8 821
pro vyhledávání: '"Qi Xin"'
Autor:
Wang, Zhiyong, Fu, Ruibo, Wen, Zhengqi, Tao, Jianhua, Wang, Xiaopeng, Xie, Yuankun, Qi, Xin, Shi, Shuchen, Lu, Yi, Liu, Yukun, Li, Chenxing, Liu, Xuefei, Li, Guanjun
Speech synthesis technology has posed a serious threat to speaker verification systems. Currently, the most effective fake audio detection methods utilize pretrained models, and integrating features from various layers of pretrained model further enh
Externí odkaz:
http://arxiv.org/abs/2409.11909
Autor:
Qi, Xin, Fu, Ruibo, Wen, Zhengqi, Wang, Tao, Qiang, Chunyu, Tao, Jianhua, Li, Chenxing, Lu, Yi, Shi, Shuchen, Wang, Zhiyong, Wang, Xiaopeng, Xie, Yuankun, Liu, Yukun, Liu, Xuefei, Li, Guanjun
In recent years, speech diffusion models have advanced rapidly. Alongside the widely used U-Net architecture, transformer-based models such as the Diffusion Transformer (DiT) have also gained attention. However, current DiT speech models treat Mel sp
Externí odkaz:
http://arxiv.org/abs/2409.11835
Autor:
Xiong, Chenxu, Fu, Ruibo, Shi, Shuchen, Wen, Zhengqi, Tao, Jianhua, Wang, Tao, Li, Chenxing, Qiang, Chunyu, Xie, Yuankun, Qi, Xin, Li, Guanjun, Yang, Zizheng
Current mainstream audio generation methods primarily rely on simple text prompts, often failing to capture the nuanced details necessary for multi-style audio generation. To address this limitation, the Sound Event Enhanced Prompt Adapter is propose
Externí odkaz:
http://arxiv.org/abs/2409.09381
Autor:
Xie, Yuankun, Xiong, Chenxu, Wang, Xiaopeng, Wang, Zhiyong, Lu, Yi, Qi, Xin, Fu, Ruibo, Liu, Yukun, Wen, Zhengqi, Tao, Jianhua, Li, Guanjun, Ye, Long
Currently, Audio Language Models (ALMs) are rapidly advancing due to the developments in large language models and audio neural codecs. These ALMs have significantly lowered the barrier to creating deepfake audio, generating highly realistic and dive
Externí odkaz:
http://arxiv.org/abs/2408.10853
Autor:
Qi, Xin, Fu, Ruibo, Wen, Zhengqi, Tao, Jianhua, Shi, Shuchen, Lu, Yi, Wang, Zhiyong, Wang, Xiaopeng, Xie, Yuankun, Liu, Yukun, Li, Guanjun, Liu, Xuefei, Li, Yongwei
In the current era of Artificial Intelligence Generated Content (AIGC), a Low-Rank Adaptation (LoRA) method has emerged. It uses a plugin-based approach to learn new knowledge with lower parameter quantities and computational costs, and it can be plu
Externí odkaz:
http://arxiv.org/abs/2408.10852
Autor:
Wang, Zhiyong, Wang, Xiaopeng, Xie, Yuankun, Fu, Ruibo, Wen, Zhengqi, Tao, Jianhua, Liu, Yukun, Li, Guanjun, Qi, Xin, Lu, Yi, Liu, Xuefei, Li, Yongwei
In the field of deepfake detection, previous studies focus on using reconstruction or mask and prediction methods to train pre-trained models, which are then transferred to fake audio detection training where the encoder is used to extract features,
Externí odkaz:
http://arxiv.org/abs/2408.10849
This work evaluates the passivation effectiveness of ultrathin nitride layers (SiNx, AlN, TiN) deposited via atomic layer deposition on CdTe/MgCdTe double heterostructures for solar cell applications. Time-resolved photoluminescence and photoluminesc
Externí odkaz:
http://arxiv.org/abs/2408.10696
ASRRL-TTS: Agile Speaker Representation Reinforcement Learning for Text-to-Speech Speaker Adaptation
Autor:
Fu, Ruibo, Qi, Xin, Wen, Zhengqi, Tao, Jianhua, Wang, Tao, Qiang, Chunyu, Wang, Zhiyong, Lu, Yi, Wang, Xiaopeng, Shi, Shuchen, Liu, Yukun, Liu, Xuefei, Zhang, Shuai
Speaker adaptation, which involves cloning voices from unseen speakers in the Text-to-Speech task, has garnered significant interest due to its numerous applications in multi-media fields. Despite recent advancements, existing methods often struggle
Externí odkaz:
http://arxiv.org/abs/2407.05421
Autor:
Fu, Ruibo, Liu, Rui, Qiang, Chunyu, Gao, Yingming, Lu, Yi, Shi, Shuchen, Wang, Tao, Li, Ya, Wen, Zhengqi, Zhang, Chen, Bu, Hui, Liu, Yukun, Qi, Xin, Li, Guanjun
The Inspirational and Convincing Audio Generation Challenge 2024 (ICAGC 2024) is part of the ISCSLP 2024 Competitions and Challenges track. While current text-to-speech (TTS) technology can generate high-quality audio, its ability to convey complex e
Externí odkaz:
http://arxiv.org/abs/2407.12038
This paper presents the development of a speech synthesis system for the LIMMITS'24 Challenge, focusing primarily on Track 2. The objective of the challenge is to establish a multi-speaker, multi-lingual Indic Text-to-Speech system with voice cloning
Externí odkaz:
http://arxiv.org/abs/2406.17801