Zobrazeno 1 - 10
of 1 025
pro vyhledávání: '"Liu, Xuefei"'
Autor:
Wang, Zhiyong, Fu, Ruibo, Wen, Zhengqi, Tao, Jianhua, Wang, Xiaopeng, Xie, Yuankun, Qi, Xin, Shi, Shuchen, Lu, Yi, Liu, Yukun, Li, Chenxing, Liu, Xuefei, Li, Guanjun
Speech synthesis technology has posed a serious threat to speaker verification systems. Currently, the most effective fake audio detection methods utilize pretrained models, and integrating features from various layers of pretrained model further enh
Externí odkaz:
http://arxiv.org/abs/2409.11909
Autor:
Qi, Xin, Fu, Ruibo, Wen, Zhengqi, Wang, Tao, Qiang, Chunyu, Tao, Jianhua, Li, Chenxing, Lu, Yi, Shi, Shuchen, Wang, Zhiyong, Wang, Xiaopeng, Xie, Yuankun, Liu, Yukun, Liu, Xuefei, Li, Guanjun
In recent years, speech diffusion models have advanced rapidly. Alongside the widely used U-Net architecture, transformer-based models such as the Diffusion Transformer (DiT) have also gained attention. However, current DiT speech models treat Mel sp
Externí odkaz:
http://arxiv.org/abs/2409.11835
With the rapid development of deepfake technology, especially the deep audio fake technology, misinformation detection on the social media scene meets a great challenge. Social media data often contains multimodal information which includes audio, vi
Externí odkaz:
http://arxiv.org/abs/2408.12558
Autor:
Qi, Xin, Fu, Ruibo, Wen, Zhengqi, Tao, Jianhua, Shi, Shuchen, Lu, Yi, Wang, Zhiyong, Wang, Xiaopeng, Xie, Yuankun, Liu, Yukun, Li, Guanjun, Liu, Xuefei, Li, Yongwei
In the current era of Artificial Intelligence Generated Content (AIGC), a Low-Rank Adaptation (LoRA) method has emerged. It uses a plugin-based approach to learn new knowledge with lower parameter quantities and computational costs, and it can be plu
Externí odkaz:
http://arxiv.org/abs/2408.10852
Autor:
Wang, Zhiyong, Wang, Xiaopeng, Xie, Yuankun, Fu, Ruibo, Wen, Zhengqi, Tao, Jianhua, Liu, Yukun, Li, Guanjun, Qi, Xin, Lu, Yi, Liu, Xuefei, Li, Yongwei
In the field of deepfake detection, previous studies focus on using reconstruction or mask and prediction methods to train pre-trained models, which are then transferred to fake audio detection training where the encoder is used to extract features,
Externí odkaz:
http://arxiv.org/abs/2408.10849
Autor:
Cai, Cong, Liang, Shan, Liu, Xuefei, Zhu, Kang, Wen, Zhengqi, Tao, Jianhua, Xie, Heng, Cui, Jizhou, Ma, Yiming, Cheng, Zhenhua, Xu, Hanzhe, Fu, Ruibo, Liu, Bin, Li, Yongwei
Deception detection has garnered increasing attention in recent years due to the significant growth of digital media and heightened ethical and security concerns. It has been extensively studied using multimodal methods, including video, audio, and t
Externí odkaz:
http://arxiv.org/abs/2407.12274
ASRRL-TTS: Agile Speaker Representation Reinforcement Learning for Text-to-Speech Speaker Adaptation
Autor:
Fu, Ruibo, Qi, Xin, Wen, Zhengqi, Tao, Jianhua, Wang, Tao, Qiang, Chunyu, Wang, Zhiyong, Lu, Yi, Wang, Xiaopeng, Shi, Shuchen, Liu, Yukun, Liu, Xuefei, Zhang, Shuai
Speaker adaptation, which involves cloning voices from unseen speakers in the Text-to-Speech task, has garnered significant interest due to its numerous applications in multi-media fields. Despite recent advancements, existing methods often struggle
Externí odkaz:
http://arxiv.org/abs/2407.05421
Autor:
Fu, Ruibo, Shi, Shuchen, Guo, Hongming, Wang, Tao, Qiang, Chunyu, Wen, Zhengqi, Tao, Jianhua, Qi, Xin, Lu, Yi, Wang, Xiaopeng, Wang, Zhiyong, Liu, Yukun, Liu, Xuefei, Zhang, Shuai, Li, Guanjun
Foley audio, critical for enhancing the immersive experience in multimedia content, faces significant challenges in the AI-generated content (AIGC) landscape. Despite advancements in AIGC technologies for text and image generation, the foley audio du
Externí odkaz:
http://arxiv.org/abs/2406.10591
Autor:
Lu, Yi, Xie, Yuankun, Fu, Ruibo, Wen, Zhengqi, Tao, Jianhua, Wang, Zhiyong, Qi, Xin, Liu, Xuefei, Li, Yongwei, Liu, Yukun, Wang, Xiaopeng, Shi, Shuchen
With the proliferation of Large Language Model (LLM) based deepfake audio, there is an urgent need for effective detection methods. Previous deepfake audio generation methods typically involve a multi-step generation process, with the final step usin
Externí odkaz:
http://arxiv.org/abs/2406.08112
Autor:
Shi, Shuchen, Fu, Ruibo, Wen, Zhengqi, Tao, Jianhua, Wang, Tao, Qiang, Chunyu, Lu, Yi, Qi, Xin, Liu, Xuefei, Liu, Yukun, Li, Yongwei, Wang, Zhiyong, Wang, Xiaopeng
Text-to-Audio (TTA) aims to generate audio that corresponds to the given text description, playing a crucial role in media production. The text descriptions in TTA datasets lack rich variations and diversity, resulting in a drop in TTA model performa
Externí odkaz:
http://arxiv.org/abs/2406.04683