Zobrazeno 1 - 10
of 98
pro vyhledávání: '"Deng, Liqun"'
It was shown that pre-trained models with self-supervised learning (SSL) techniques are effective in various downstream speech tasks. However, most such models are trained on single-speaker speech data, limiting their effectiveness in mixture speech.
Externí odkaz:
http://arxiv.org/abs/2407.02826
Autor:
Yao, Dong, Zhu, Jieming, Xun, Jiahao, Zhang, Shengyu, Zhao, Zhou, Deng, Liqun, Zhang, Wenqiao, Dong, Zhenhua, Jiang, Xin
Recent research in self-supervised contrastive learning of music representations has demonstrated remarkable results across diverse downstream tasks. However, a prevailing trend in existing methods involves representing equally-sized music clips in e
Externí odkaz:
http://arxiv.org/abs/2312.06197
We introduce a novel task named `target speech diarization', which seeks to determine `when target event occurred' within an audio signal. We devise a neural architecture called Prompt-driven Target Speech Diarization (PTSD), that works with diverse
Externí odkaz:
http://arxiv.org/abs/2310.14823
Autor:
Xun, Jiahao, Zhang, Shengyu, Yang, Yanting, Zhu, Jieming, Deng, Liqun, Zhao, Zhou, Dong, Zhenhua, Li, Ruiqi, Zhang, Lichao, Wu, Fei
In the field of music information retrieval (MIR), cover song identification (CSI) is a challenging task that aims to identify cover versions of a query song from a massive collection. Existing works still suffer from high intra-song variances and in
Externí odkaz:
http://arxiv.org/abs/2307.09775
This study propose a fully automated system for speech correction and accent reduction. Consider the application scenario that a recorded speech audio contains certain errors, e.g., inappropriate words, mispronunciations, that need to be corrected. T
Externí odkaz:
http://arxiv.org/abs/2204.05460
Code-switching deals with alternative languages in communication process. Training end-to-end (E2E) automatic speech recognition (ASR) systems for code-switching is especially challenging as code-switching training data are always insufficient to com
Externí odkaz:
http://arxiv.org/abs/2201.12155
Autor:
Zheng, Nianzu, Deng, Liqun, Huang, Wenyong, Yeung, Yu Ting, Xu, Baohua, Guo, Yuanyuan, Wang, Yasheng, Chen, Xiao, Jiang, Xin, Liu, Qun
Mispronunciation detection and diagnosis (MDD) is a popular research focus in computer-aided pronunciation training (CAPT) systems. End-to-end (e2e) approaches are becoming dominant in MDD. However an e2e MDD model usually requires entire speech utte
Externí odkaz:
http://arxiv.org/abs/2111.08191
This paper presents the design, implementation and evaluation of a speech editing system, named EditSpeech, which allows a user to perform deletion, insertion and replacement of words in a given speech utterance, without causing audible degradation i
Externí odkaz:
http://arxiv.org/abs/2107.01554
Dysarthric speech detection (DSD) systems aim to detect characteristics of the neuromotor disorder from speech. Such systems are particularly susceptible to domain mismatch where the training and testing data come from the source and target domains r
Externí odkaz:
http://arxiv.org/abs/2106.10127
One-shot voice conversion (VC), which performs conversion across arbitrary speakers with only a single target-speaker utterance for reference, can be effectively achieved by speech representation disentanglement. Existing work generally ignores the c
Externí odkaz:
http://arxiv.org/abs/2106.10132