Výsledky vyhledávání

Report

Mixture of Experts Fusion for Fake Audio Detection Using Frozen wav2vec 2.0

Autor: Wang, Zhiyong, Fu, Ruibo, Wen, Zhengqi, Tao, Jianhua, Wang, Xiaopeng, Xie, Yuankun, Qi, Xin, Shi, Shuchen, Lu, Yi, Liu, Yukun, Li, Chenxing, Liu, Xuefei, Li, Guanjun

Speech synthesis technology has posed a serious threat to speaker verification systems. Currently, the most effective fake audio detection methods utilize pretrained models, and integrating features from various layers of pretrained model further enh

Externí odkaz: http://arxiv.org/abs/2409.11909

Zobrazit plný text záznamu

Report

DPI-TTS: Directional Patch Interaction for Fast-Converging and Style Temporal Modeling in Text-to-Speech

Autor: Qi, Xin, Fu, Ruibo, Wen, Zhengqi, Wang, Tao, Qiang, Chunyu, Tao, Jianhua, Li, Chenxing, Lu, Yi, Shi, Shuchen, Wang, Zhiyong, Wang, Xiaopeng, Xie, Yuankun, Liu, Yukun, Liu, Xuefei, Li, Guanjun

In recent years, speech diffusion models have advanced rapidly. Alongside the widely used U-Net architecture, transformer-based models such as the Diffusion Transformer (DiT) have also gained attention. However, current DiT speech models treat Mel sp

Externí odkaz: http://arxiv.org/abs/2409.11835

Zobrazit plný text záznamu

Report

Exploring the Role of Audio in Multimodal Misinformation Detection

Autor: Liu, Moyang, Liu, Yukun, Fu, Ruibo, Wen, Zhengqi, Tao, Jianhua, Liu, Xuefei, Li, Guanjun

With the rapid development of deepfake technology, especially the deep audio fake technology, misinformation detection on the social media scene meets a great challenge. Social media data often contains multimodal information which includes audio, vi

Externí odkaz: http://arxiv.org/abs/2408.12558

Zobrazit plný text záznamu

Report

EELE: Exploring Efficient and Extensible LoRA Integration in Emotional Text-to-Speech

Autor: Qi, Xin, Fu, Ruibo, Wen, Zhengqi, Tao, Jianhua, Shi, Shuchen, Lu, Yi, Wang, Zhiyong, Wang, Xiaopeng, Xie, Yuankun, Liu, Yukun, Li, Guanjun, Liu, Xuefei, Li, Yongwei

In the current era of Artificial Intelligence Generated Content (AIGC), a Low-Rank Adaptation (LoRA) method has emerged. It uses a plugin-based approach to learn new knowledge with lower parameter quantities and computational costs, and it can be plu

Externí odkaz: http://arxiv.org/abs/2408.10852

Zobrazit plný text záznamu

Report

A Noval Feature via Color Quantisation for Fake Audio Detection

Autor: Wang, Zhiyong, Wang, Xiaopeng, Xie, Yuankun, Fu, Ruibo, Wen, Zhengqi, Tao, Jianhua, Liu, Yukun, Li, Guanjun, Qi, Xin, Lu, Yi, Liu, Xuefei, Li, Yongwei

In the field of deepfake detection, previous studies focus on using reconstruction or mask and prediction methods to train pre-trained models, which are then transferred to fake audio detection training where the encoder is used to extract features,

Externí odkaz: http://arxiv.org/abs/2408.10849

Zobrazit plný text záznamu

Report

MDPE: A Multimodal Deception Dataset with Personality and Emotional Characteristics

Autor: Cai, Cong, Liang, Shan, Liu, Xuefei, Zhu, Kang, Wen, Zhengqi, Tao, Jianhua, Xie, Heng, Cui, Jizhou, Ma, Yiming, Cheng, Zhenhua, Xu, Hanzhe, Fu, Ruibo, Liu, Bin, Li, Yongwei

Deception detection has garnered increasing attention in recent years due to the significant growth of digital media and heightened ethical and security concerns. It has been extensively studied using multimodal methods, including video, audio, and t

Externí odkaz: http://arxiv.org/abs/2407.12274

Zobrazit plný text záznamu

Report

ASRRL-TTS: Agile Speaker Representation Reinforcement Learning for Text-to-Speech Speaker Adaptation

Autor: Fu, Ruibo, Qi, Xin, Wen, Zhengqi, Tao, Jianhua, Wang, Tao, Qiang, Chunyu, Wang, Zhiyong, Lu, Yi, Wang, Xiaopeng, Shi, Shuchen, Liu, Yukun, Liu, Xuefei, Zhang, Shuai

Speaker adaptation, which involves cloning voices from unseen speakers in the Text-to-Speech task, has garnered significant interest due to its numerous applications in multi-media fields. Despite recent advancements, existing methods often struggle

Externí odkaz: http://arxiv.org/abs/2407.05421

Zobrazit plný text záznamu

Report

MINT: a Multi-modal Image and Narrative Text Dubbing Dataset for Foley Audio Content Planning and Generation

Autor: Fu, Ruibo, Shi, Shuchen, Guo, Hongming, Wang, Tao, Qiang, Chunyu, Wen, Zhengqi, Tao, Jianhua, Qi, Xin, Lu, Yi, Wang, Xiaopeng, Wang, Zhiyong, Liu, Yukun, Liu, Xuefei, Zhang, Shuai, Li, Guanjun

Foley audio, critical for enhancing the immersive experience in multimedia content, faces significant challenges in the AI-generated content (AIGC) landscape. Despite advancements in AIGC technologies for text and image generation, the foley audio du

Externí odkaz: http://arxiv.org/abs/2406.10591

Zobrazit plný text záznamu

Report

Codecfake: An Initial Dataset for Detecting LLM-based Deepfake Audio

Autor: Lu, Yi, Xie, Yuankun, Fu, Ruibo, Wen, Zhengqi, Tao, Jianhua, Wang, Zhiyong, Qi, Xin, Liu, Xuefei, Li, Yongwei, Liu, Yukun, Wang, Xiaopeng, Shi, Shuchen

With the proliferation of Large Language Model (LLM) based deepfake audio, there is an urgent need for effective detection methods. Previous deepfake audio generation methods typically involve a multi-step generation process, with the final step usin

Externí odkaz: http://arxiv.org/abs/2406.08112

Zobrazit plný text záznamu

Report

PPPR: Portable Plug-in Prompt Refiner for Text to Audio Generation

Autor: Shi, Shuchen, Fu, Ruibo, Wen, Zhengqi, Tao, Jianhua, Wang, Tao, Qiang, Chunyu, Lu, Yi, Qi, Xin, Liu, Xuefei, Liu, Yukun, Li, Yongwei, Wang, Zhiyong, Wang, Xiaopeng

Text-to-Audio (TTA) aims to generate audio that corresponds to the given text description, playing a crucial role in media production. The text descriptions in TTA datasets lack rich variations and diversity, resulting in a drop in TTA model performa

Externí odkaz: http://arxiv.org/abs/2406.04683

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání