Real-Time Speech Frequency Bandwidth Extension
Autor: | Dominik Roblek, Yunpeng Li, Oleg Rybakov, Victor Ungureanu, Marco Tagliasacchi |
---|---|
Rok vydání: | 2021 |
Předmět: |
FOS: Computer and information sciences
Mobile processor Sound (cs.SD) Computer science Bandwidth (signal processing) Real-time computing Frame (networking) Latency (audio) Communications system Computer Science - Sound Speech enhancement Audio and Speech Processing (eess.AS) FOS: Electrical engineering electronic engineering information engineering Single-core Latency (engineering) Electrical Engineering and Systems Science - Audio and Speech Processing |
Zdroj: | ICASSP |
DOI: | 10.1109/icassp39728.2021.9413439 |
Popis: | In this paper we propose a lightweight model for frequency bandwidth extension of speech signals, increasing the sampling frequency from 8kHz to 16kHz while restoring the high frequency content to a level almost indistinguishable from the 16kHz ground truth. The model architecture is based on SEANet (Sound EnhAncement Network), a wave-to-wave fully convolutional model, which uses a combination of feature losses and adversarial losses to reconstruct an enhanced version of the input speech. In addition, we propose a variant of SEANet that can be deployed on-device in streaming mode, achieving an architectural latency of 16ms. When profiled on a single core of a mobile CPU, processing one 16ms frame takes only 1.5ms. The low latency makes it viable for bi-directional voice communication systems. |
Databáze: | OpenAIRE |
Externí odkaz: |