Zobrazeno 1 - 10
of 30
pro vyhledávání: '"Han, Jiangyu"'
End-to-end neural diarization has evolved considerably over the past few years, but data scarcity is still a major obstacle for further improvements. Self-supervised learning methods such as WavLM have shown promising performance on several downstrea
Externí odkaz:
http://arxiv.org/abs/2409.09408
Autor:
Han, Jiangyu, Landini, Federico, Rohdin, Johan, Diez, Mireia, Burget, Lukas, Cao, Yuhang, Lu, Heng, Cernocky, Jan
In this work, we propose an error correction framework, named DiaCorrect, to refine the output of a diarization system in a simple yet effective way. This method is inspired by error correction techniques in automatic speech recognition. Our model co
Externí odkaz:
http://arxiv.org/abs/2309.08377
SqueezeFormer has recently shown impressive performance in automatic speech recognition (ASR). However, its inference speed suffers from the quadratic complexity of softmax-attention (SA). In addition, limited by the large convolution kernel size, th
Externí odkaz:
http://arxiv.org/abs/2303.08636
Recently, more and more personalized speech enhancement systems (PSE) with excellent performance have been proposed. However, two critical issues still limit the performance and generalization ability of the model: 1) Acoustic environment mismatch be
Externí odkaz:
http://arxiv.org/abs/2211.12097
In recent years, speaker diarization has attracted widespread attention. To achieve better performance, some studies propose to diarize speech in multiple stages. Although these methods might bring additional benefits, most of them are quite complex.
Externí odkaz:
http://arxiv.org/abs/2210.17189
Autor:
Han, Jiangyu, Long, Yanhua
Recently, supervised speech separation has made great progress. However, limited by the nature of supervised training, most existing separation methods require ground-truth sources and are trained on synthetic datasets. This ground-truth reliance is
Externí odkaz:
http://arxiv.org/abs/2204.11032
PercepNet, a recent extension of the RNNoise, an efficient, high-quality and real-time full-band speech enhancement technique, has shown promising performance in various public deep noise suppression tasks. This paper proposes a new approach, named P
Externí odkaz:
http://arxiv.org/abs/2203.02263
In recent years, a number of time-domain speech separation methods have been proposed. However, most of them are very sensitive to the environments and wide domain coverage tasks. In this paper, from the time-frequency domain perspective, we propose
Externí odkaz:
http://arxiv.org/abs/2112.13520
Autor:
Han, Jiangyu1 (AUTHOR), Hao, Xu2 (AUTHOR), Fatima, Mishal3 (AUTHOR), Chauhdary, Zunera3 (AUTHOR), Jamshed, Ayesha4 (AUTHOR), Abdur Rahman, Hafiz Muhammad5 (AUTHOR), Siddique, Rida3 (AUTHOR), Asif, Muhammad3 (AUTHOR), Rana, Saba3 (AUTHOR), Hussain, Liaqat3 (AUTHOR) liaqat.hussain@gcuf.edu.pk
Publikováno v:
Dose-Response. Jul-Sep2024, Vol. 22 Issue 3, p1-17. 17p.
Target speech extraction has attracted widespread attention. When microphone arrays are available, the additional spatial information can be helpful in extracting the target speech. We have recently proposed a channel decorrelation (CD) mechanism to
Externí odkaz:
http://arxiv.org/abs/2106.03113