Zobrazeno 1 - 10
of 4 716
pro vyhledávání: '"Li, XiaoFei"'
Autor:
Liang, Di, Li, Xiaofei
This work proposes a frame-wise online/streaming end-to-end neural diarization (EEND) method, which detects speaker activities in a frame-in-frame-out fashion. The proposed model mainly consists of a causal embedding encoder and an online attractor d
Externí odkaz:
http://arxiv.org/abs/2410.06670
Autor:
Fang, Ying, Li, Xiaofei
This paper works on streaming automatic speech recognition (ASR). Mamba, a recently proposed state space model, has demonstrated the ability to match or surpass Transformers in various tasks while benefiting from a linear complexity advantage. We exp
Externí odkaz:
http://arxiv.org/abs/2410.00070
Autor:
Yang, Bing, Quan, Changsheng, Wang, Yabo, Wang, Pengyu, Yang, Yujie, Fang, Ying, Shao, Nian, Bu, Hui, Xu, Xin, Li, Xiaofei
The training of deep learning-based multichannel speech enhancement and source localization systems relies heavily on the simulation of room impulse response and multichannel diffuse noise, due to the lack of large-scale real-recorded datasets. Howev
Externí odkaz:
http://arxiv.org/abs/2406.19959
Reference Channel Selection by Multi-Channel Masking for End-to-End Multi-Channel Speech Enhancement
In end-to-end multi-channel speech enhancement, the traditional approach of designating one microphone signal as the reference for processing may not always yield optimal results. The limitation is particularly in scenarios with large distributed mic
Externí odkaz:
http://arxiv.org/abs/2406.03228
The increasing difficulty in accurately detecting forged images generated by AIGC(Artificial Intelligence Generative Content) poses many risks, necessitating the development of effective methods to identify and further locate forged areas. In this pa
Externí odkaz:
http://arxiv.org/abs/2406.01489
Extracting direct-path spatial feature is crucial for sound source localization in adverse acoustic environments. This paper proposes the IPDnet, a neural network that estimates direct-path inter-channel phase difference (DP-IPD) of sound sources fro
Externí odkaz:
http://arxiv.org/abs/2405.07021
If two conducting or insulating inclusions are closely located, the gradient of the solution may become arbitrarily large as the distance between inclusions tends to zero, resulting in high concentration of stress in between two inclusions. This happ
Externí odkaz:
http://arxiv.org/abs/2404.03258
Autor:
Quan, Changsheng, Li, Xiaofei
In this work, we extend our previously proposed offline SpatialNet for long-term streaming multichannel speech enhancement in both static and moving speaker scenarios. SpatialNet exploits spatial information, such as the spatial/steering direction of
Externí odkaz:
http://arxiv.org/abs/2403.07675
In this work, we propose Mel-FullSubNet, a single-channel Mel-spectrogram denoising and dereverberation network for improving both speech quality and automatic speech recognition (ASR) performance. Mel-FullSubNet takes as input the noisy and reverber
Externí odkaz:
http://arxiv.org/abs/2402.13511
Autor:
Li, Xiaofei, Wang, Yu, Liu, Xin, Ma, Yuan, Cai, Yangjian, Ponomarenko, Sergey A., Liu, Xianlong
Having shown early promise, free-space optical communications (FSO) face formidable challenges in the age of information explosion. The ever-growing demand for greater channel communication capacity is one of the challenges. The inter-channel crossta
Externí odkaz:
http://arxiv.org/abs/2401.10392