Zobrazeno 1 - 10
of 592
pro vyhledávání: '"Wang, Yannan"'
Target speaker extraction (TSE) relies on a reference cue of the target to extract the target speech from a speech mixture. While a speaker embedding is commonly used as the reference cue, such embedding pre-trained with a large number of speakers ma
Externí odkaz:
http://arxiv.org/abs/2410.16059
The scarcity of labeled audio-visual datasets is a constraint for training superior audio-visual speaker diarization systems. To improve the performance of audio-visual speaker diarization, we leverage pre-trained supervised and self-supervised speec
Externí odkaz:
http://arxiv.org/abs/2312.04131
Autor:
Zhang, Li, Zhao, Huan, Li, Yue, Pang, Bowen, Wang, Yannan, Wang, Hongji, Rao, Wei, Wang, Qing, Xie, Lei
This paper describes the FlySpeech speaker diarization system submitted to the second \textbf{M}ultimodal \textbf{I}nformation Based \textbf{S}peech \textbf{P}rocessing~(\textbf{MISP}) Challenge held in ICASSP 2022. We develop an end-to-end audio-vis
Externí odkaz:
http://arxiv.org/abs/2307.15400
Autor:
Chen, Jun, Rao, Wei, Wang, Zilin, Lin, Jiuxin, Ju, Yukai, He, Shulin, Wang, Yannan, Wu, Zhiyong
The previous SpEx+ has yielded outstanding performance in speaker extraction and attracted much attention. However, it still encounters inadequate utilization of multi-scale information and speaker embedding. To this end, this paper proposes a new ef
Externí odkaz:
http://arxiv.org/abs/2306.16250
Autor:
Liu, Wenzhe, Shi, Yupeng, Chen, Jun, Rao, Wei, He, Shulin, Li, Andong, Wang, Yannan, Wu, Zhiyong
This paper describes a real-time General Speech Reconstruction (Gesper) system submitted to the ICASSP 2023 Speech Signal Improvement (SSI) Challenge. This novel proposed system is a two-stage architecture, in which the speech restoration is performe
Externí odkaz:
http://arxiv.org/abs/2306.08454
Autor:
Chen, Jun, Rao, Wei, Wang, Zilin, Lin, Jiuxin, Wu, Zhiyong, Wang, Yannan, Shang, Shidong, Meng, Helen
Subband-based approaches process subbands in parallel through the model with shared parameters to learn the commonality of local spectrums for noise reduction. In this way, they have achieved remarkable results with fewer parameters. However, in some
Externí odkaz:
http://arxiv.org/abs/2305.05599
Autor:
Ju, Yukai, Chen, Jun, Zhang, Shimin, He, Shulin, Rao, Wei, Zhu, Weixin, Wang, Yannan, Yu, Tao, Shang, Shidong
This paper introduces the Unbeatable Team's submission to the ICASSP 2023 Deep Noise Suppression (DNS) Challenge. We expand our previous work, TEA-PSE, to its upgraded version -- TEA-PSE 3.0. Specifically, TEA-PSE 3.0 incorporates a residual LSTM aft
Externí odkaz:
http://arxiv.org/abs/2303.07704
The scarcity of labeled far-field speech is a constraint for training superior far-field speaker verification systems. Fine-tuning the model pre-trained on large-scale near-field speech substantially outperforms training from scratch. However, the fi
Externí odkaz:
http://arxiv.org/abs/2303.00264
Autor:
Chen, Jun, Rao, Wei, Wang, Zilin, Wu, Zhiyong, Wang, Yannan, Yu, Tao, Shang, Shidong, Meng, Helen
FullSubNet has shown its promising performance on speech enhancement by utilizing both fullband and subband information. However, the relationship between fullband and subband in FullSubNet is achieved by simply concatenating the output of fullband m
Externí odkaz:
http://arxiv.org/abs/2211.05432
Autor:
He, Shulin, Rao, Wei, Liu, Jinjiang, Chen, Jun, Ju, Yukai, Zhang, Xueliang, Wang, Yannan, Shang, Shidong
Most neural network speech enhancement models ignore speech production mathematical models by directly mapping Fourier transform spectrums or waveforms. In this work, we propose a neural source filter network for speech enhancement. Specifically, we
Externí odkaz:
http://arxiv.org/abs/2210.15853