Výsledky vyhledávání

Report

Mixture of Experts Fusion for Fake Audio Detection Using Frozen wav2vec 2.0

Autor: Wang, Zhiyong, Fu, Ruibo, Wen, Zhengqi, Tao, Jianhua, Wang, Xiaopeng, Xie, Yuankun, Qi, Xin, Shi, Shuchen, Lu, Yi, Liu, Yukun, Li, Chenxing, Liu, Xuefei, Li, Guanjun

Speech synthesis technology has posed a serious threat to speaker verification systems. Currently, the most effective fake audio detection methods utilize pretrained models, and integrating features from various layers of pretrained model further enh

Externí odkaz: http://arxiv.org/abs/2409.11909

Zobrazit plný text záznamu

Report

DPI-TTS: Directional Patch Interaction for Fast-Converging and Style Temporal Modeling in Text-to-Speech

Autor: Qi, Xin, Fu, Ruibo, Wen, Zhengqi, Wang, Tao, Qiang, Chunyu, Tao, Jianhua, Li, Chenxing, Lu, Yi, Shi, Shuchen, Wang, Zhiyong, Wang, Xiaopeng, Xie, Yuankun, Liu, Yukun, Liu, Xuefei, Li, Guanjun

In recent years, speech diffusion models have advanced rapidly. Alongside the widely used U-Net architecture, transformer-based models such as the Diffusion Transformer (DiT) have also gained attention. However, current DiT speech models treat Mel sp

Externí odkaz: http://arxiv.org/abs/2409.11835

Zobrazit plný text záznamu

Report

Text Prompt is Not Enough: Sound Event Enhanced Prompt Adapter for Target Style Audio Generation

Autor: Xiong, Chenxu, Fu, Ruibo, Shi, Shuchen, Wen, Zhengqi, Tao, Jianhua, Wang, Tao, Li, Chenxing, Qiang, Chunyu, Xie, Yuankun, Qi, Xin, Li, Guanjun, Yang, Zizheng

Current mainstream audio generation methods primarily rely on simple text prompts, often failing to capture the nuanced details necessary for multi-style audio generation. To address this limitation, the Sound Event Enhanced Prompt Adapter is propose

Externí odkaz: http://arxiv.org/abs/2409.09381

Zobrazit plný text záznamu

Report

Does Current Deepfake Audio Detection Model Effectively Detect ALM-based Deepfake Audio?

Autor: Xie, Yuankun, Xiong, Chenxu, Wang, Xiaopeng, Wang, Zhiyong, Lu, Yi, Qi, Xin, Fu, Ruibo, Liu, Yukun, Wen, Zhengqi, Tao, Jianhua, Li, Guanjun, Ye, Long

Currently, Audio Language Models (ALMs) are rapidly advancing due to the developments in large language models and audio neural codecs. These ALMs have significantly lowered the barrier to creating deepfake audio, generating highly realistic and dive

Externí odkaz: http://arxiv.org/abs/2408.10853

Zobrazit plný text záznamu

Report

EELE: Exploring Efficient and Extensible LoRA Integration in Emotional Text-to-Speech

Autor: Qi, Xin, Fu, Ruibo, Wen, Zhengqi, Tao, Jianhua, Shi, Shuchen, Lu, Yi, Wang, Zhiyong, Wang, Xiaopeng, Xie, Yuankun, Liu, Yukun, Li, Guanjun, Liu, Xuefei, Li, Yongwei

In the current era of Artificial Intelligence Generated Content (AIGC), a Low-Rank Adaptation (LoRA) method has emerged. It uses a plugin-based approach to learn new knowledge with lower parameter quantities and computational costs, and it can be plu

Externí odkaz: http://arxiv.org/abs/2408.10852

Zobrazit plný text záznamu

Report

A Noval Feature via Color Quantisation for Fake Audio Detection

Autor: Wang, Zhiyong, Wang, Xiaopeng, Xie, Yuankun, Fu, Ruibo, Wen, Zhengqi, Tao, Jianhua, Liu, Yukun, Li, Guanjun, Qi, Xin, Lu, Yi, Liu, Xuefei, Li, Yongwei

In the field of deepfake detection, previous studies focus on using reconstruction or mask and prediction methods to train pre-trained models, which are then transferred to fake audio detection training where the encoder is used to extract features,

Externí odkaz: http://arxiv.org/abs/2408.10849

Zobrazit plný text záznamu

Report

Impact of ALD-Deposited Ultrathin Nitride Layers on Carrier Lifetimes and Photoluminescence Efficiency in CdTe/MgCdTe Double Heterostructures

Autor: Abbasi, Haris Naeem, Qi, Xin, Ju, Zheng, Ma, Zhenqiang, Zhang, Yong-Hang

This work evaluates the passivation effectiveness of ultrathin nitride layers (SiNx, AlN, TiN) deposited via atomic layer deposition on CdTe/MgCdTe double heterostructures for solar cell applications. Time-resolved photoluminescence and photoluminesc

Externí odkaz: http://arxiv.org/abs/2408.10696

Zobrazit plný text záznamu

Report

ASRRL-TTS: Agile Speaker Representation Reinforcement Learning for Text-to-Speech Speaker Adaptation

Autor: Fu, Ruibo, Qi, Xin, Wen, Zhengqi, Tao, Jianhua, Wang, Tao, Qiang, Chunyu, Wang, Zhiyong, Lu, Yi, Wang, Xiaopeng, Shi, Shuchen, Liu, Yukun, Liu, Xuefei, Zhang, Shuai

Speaker adaptation, which involves cloning voices from unseen speakers in the Text-to-Speech task, has garnered significant interest due to its numerous applications in multi-media fields. Despite recent advancements, existing methods often struggle

Externí odkaz: http://arxiv.org/abs/2407.05421

Zobrazit plný text záznamu

Report

ICAGC 2024: Inspirational and Convincing Audio Generation Challenge 2024

Autor: Fu, Ruibo, Liu, Rui, Qiang, Chunyu, Gao, Yingming, Lu, Yi, Shi, Shuchen, Wang, Tao, Li, Ya, Wen, Zhengqi, Zhang, Chen, Bu, Hui, Liu, Yukun, Qi, Xin, Li, Guanjun

The Inspirational and Convincing Audio Generation Challenge 2024 (ICAGC 2024) is part of the ISCSLP 2024 Competitions and Challenges track. While current text-to-speech (TTS) technology can generate high-quality audio, its ability to convey complex e

Externí odkaz: http://arxiv.org/abs/2407.12038

Zobrazit plný text záznamu

Report

A multi-speaker multi-lingual voice cloning system based on vits2 for limmits 2024 challenge

Autor: Wang, Xiaopeng, Lu, Yi, Qi, Xin, Wang, Zhiyong, Xie, Yuankun, Shi, Shuchen, Fu, Ruibo

This paper presents the development of a speech synthesis system for the LIMMITS'24 Challenge, focusing primarily on Track 2. The objective of the challenge is to establish a multi-speaker, multi-lingual Indic Text-to-Speech system with voice cloning

Externí odkaz: http://arxiv.org/abs/2406.17801

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání