End-to-End Complex-Valued Multidilated Convolutional Neural Network for Joint Acoustic Echo Cancellation and Noise Suppression
Autor: | Karn N. Watcharasupat, Thi Ngoc Tho Nguyen, Woon-Seng Gan, Shengkui Zhao, Bin Ma |
---|---|
Rok vydání: | 2021 |
Předmět: |
Signal Processing (eess.SP)
FOS: Computer and information sciences Computer Science - Machine Learning Sound (cs.SD) Artificial Intelligence (cs.AI) Computer Science - Artificial Intelligence Audio and Speech Processing (eess.AS) FOS: Electrical engineering electronic engineering information engineering Electrical Engineering and Systems Science - Signal Processing Computer Science - Sound Electrical Engineering and Systems Science - Audio and Speech Processing Machine Learning (cs.LG) |
DOI: | 10.48550/arxiv.2110.00745 |
Popis: | Echo and noise suppression is an integral part of a full-duplex communication system. Many recent acoustic echo cancellation (AEC) systems rely on a separate adaptive filtering module for linear echo suppression and a neural module for residual echo suppression. However, not only do adaptive filtering modules require convergence and remain susceptible to changes in acoustic environments, but this two-stage framework also often introduces unnecessary delays to the AEC system when neural modules are already capable of both linear and nonlinear echo suppression. In this paper, we exploit the offset-compensating ability of complex time-frequency masks and propose an end-to-end complex-valued neural network architecture. The building block of the proposed model is a pseudocomplex extension based on the densely-connected multidilated DenseNet (D3Net) building block, resulting in a very small network of only 354K parameters. The architecture utilized the multi-resolution nature of the D3Net building blocks to eliminate the need for pooling, allowing the network to extract features using large receptive fields without any loss of output resolution. We also propose a dual-mask technique for joint echo and noise suppression with simultaneous speech enhancement. Evaluation on both synthetic and real test sets demonstrated promising results across multiple energy-based metrics and perceptual proxies. Comment: To be presented at the 2022 International Conference on Acoustics, Speech, & Signal Processing (ICASSP) |
Databáze: | OpenAIRE |
Externí odkaz: |