Autor: |
Ozturk, Muhammed Zahid, Wu, Chenshu, Wang, Beibei, Wu, Min, Liu, K. J. Ray |
Zdroj: |
IEEE-ACM Transactions on Audio, Speech, and Language Processing; 2023, Vol. 31 Issue: 1 p1333-1347, 15p |
Abstrakt: |
Speech enhancement and separation have been a long-standing problem, especially with the recent advances using a single microphone. Although microphones perform well in constrained settings, their performance for speech separation decreases in noisy conditions. In this work, we propose RadioSES, an audioradio speech enhancement and separation system that overcomes inherent problems in audio-only systems. By fusing a complementary radio modality, RadioSES can estimate the number of speakers, solve the source association problem, separate and enhance noisy mixture speeches, and improve both intelligibility and perceptual quality. We perform millimeter-wave sensing to detect and localize speakers and introduce an audioradio deep learning framework to fuse the separate radio features with the mixed audio features. Extensive experiments using commercial off-the-shelf devices show that RadioSES outperforms a variety of state-of-the-art baselines, with consistent performance gains in different environmental settings. Similar to the audiovisual methods, RadioSES provides significant performance improvements (e.g. 3 dB gains in SiSDR, when compared with the corresponding audio-only method), along with the benefits of lower computational complexity and better privacy preservation. |
Databáze: |
Supplemental Index |
Externí odkaz: |
|