Výsledky vyhledávání

Report

SA-WavLM: Speaker-Aware Self-Supervised Pre-training for Mixture Speech

Autor: Lin, Jingru, Ge, Meng, Ao, Junyi, Deng, Liqun, Li, Haizhou

It was shown that pre-trained models with self-supervised learning (SSL) techniques are effective in various downstream speech tasks. However, most such models are trained on single-speaker speech data, limiting their effectiveness in mixture speech.

Externí odkaz: http://arxiv.org/abs/2407.02826

Zobrazit plný text záznamu

Report

MART: Learning Hierarchical Music Audio Representations with Part-Whole Transformer

Autor: Yao, Dong, Zhu, Jieming, Xun, Jiahao, Zhang, Shengyu, Zhao, Zhou, Deng, Liqun, Zhang, Wenqiao, Dong, Zhenhua, Jiang, Xin

Recent research in self-supervised contrastive learning of music representations has demonstrated remarkable results across diverse downstream tasks. However, a prevailing trend in existing methods involves representing equally-sized music clips in e

Externí odkaz: http://arxiv.org/abs/2312.06197

Zobrazit plný text záznamu

Report

Prompt-driven Target Speech Diarization

Autor: Jiang, Yidi, Chen, Zhengyang, Tao, Ruijie, Deng, Liqun, Qian, Yanmin, Li, Haizhou

We introduce a novel task named `target speech diarization', which seeks to determine `when target event occurred' within an audio signal. We devise a neural architecture called Prompt-driven Target Speech Diarization (PTSD), that works with diverse

Externí odkaz: http://arxiv.org/abs/2310.14823

Zobrazit plný text záznamu

Report

DisCover: Disentangled Music Representation Learning for Cover Song Identification

Autor: Xun, Jiahao, Zhang, Shengyu, Yang, Yanting, Zhu, Jieming, Deng, Liqun, Zhao, Zhou, Dong, Zhenhua, Li, Ruiqi, Zhang, Lichao, Wu, Fei

In the field of music information retrieval (MIR), cover song identification (CSI) is a challenging task that aims to identify cover versions of a query song from a massive collection. Existing works still suffer from high intra-song variances and in

Externí odkaz: http://arxiv.org/abs/2307.09775

Zobrazit plný text záznamu

Report

CorrectSpeech: A Fully Automated System for Speech Correction and Accent Reduction

Autor: Tan, Daxin, Deng, Liqun, Zheng, Nianzu, Yeung, Yu Ting, Jiang, Xin, Chen, Xiao, Lee, Tan

This study propose a fully automated system for speech correction and accent reduction. Consider the application scenario that a recorded speech audio contains certain errors, e.g., inappropriate words, mispronunciations, that need to be corrected. T

Externí odkaz: http://arxiv.org/abs/2204.05460

Zobrazit plný text záznamu

Report

Reducing language context confusion for end-to-end code-switching automatic speech recognition

Autor: Zhang, Shuai, Yi, Jiangyan, Tian, Zhengkun, Tao, Jianhua, Yeung, Yu Ting, Deng, Liqun

Code-switching deals with alternative languages in communication process. Training end-to-end (E2E) automatic speech recognition (ASR) systems for code-switching is especially challenging as code-switching training data are always insufficient to com

Externí odkaz: http://arxiv.org/abs/2201.12155

Zobrazit plný text záznamu

Report

CoCA-MDD: A Coupled Cross-Attention based Framework for Streaming Mispronunciation Detection and Diagnosis

Autor: Zheng, Nianzu, Deng, Liqun, Huang, Wenyong, Yeung, Yu Ting, Xu, Baohua, Guo, Yuanyuan, Wang, Yasheng, Chen, Xiao, Jiang, Xin, Liu, Qun

Mispronunciation detection and diagnosis (MDD) is a popular research focus in computer-aided pronunciation training (CAPT) systems. End-to-end (e2e) approaches are becoming dominant in MDD. However an e2e MDD model usually requires entire speech utte

Externí odkaz: http://arxiv.org/abs/2111.08191

Zobrazit plný text záznamu

Report

EditSpeech: A Text Based Speech Editing System Using Partial Inference and Bidirectional Fusion

Autor: Tan, Daxin, Deng, Liqun, Yeung, Yu Ting, Jiang, Xin, Chen, Xiao, Lee, Tan

This paper presents the design, implementation and evaluation of a speech editing system, named EditSpeech, which allows a user to perform deletion, insertion and replacement of words in a given speech utterance, without causing audible degradation i

Externí odkaz: http://arxiv.org/abs/2107.01554

Zobrazit plný text záznamu

Report

Unsupervised Domain Adaptation for Dysarthric Speech Detection via Domain Adversarial Training and Mutual Information Minimization

Autor: Wang, Disong, Deng, Liqun, Yeung, Yu Ting, Chen, Xiao, Liu, Xunying, Meng, Helen

Dysarthric speech detection (DSD) systems aim to detect characteristics of the neuromotor disorder from speech. Such systems are particularly susceptible to domain mismatch where the training and testing data come from the source and target domains r

Externí odkaz: http://arxiv.org/abs/2106.10127

Zobrazit plný text záznamu

Report

VQMIVC: Vector Quantization and Mutual Information-Based Unsupervised Speech Representation Disentanglement for One-shot Voice Conversion

Autor: Wang, Disong, Deng, Liqun, Yeung, Yu Ting, Chen, Xiao, Liu, Xunying, Meng, Helen

One-shot voice conversion (VC), which performs conversion across arbitrary speakers with only a single target-speaker utterance for reference, can be effectively achieved by speech representation disentanglement. Existing work generally ignores the c

Externí odkaz: http://arxiv.org/abs/2106.10132

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání