Online Spectrogram Inversion for Low-Latency Audio Source Separation

Autor: Paul Magron, Tuomas Virtanen
Přispěvatelé: Signal et Communications (IRIT-SC), Institut de recherche en informatique de Toulouse (IRIT), Université Toulouse 1 Capitole (UT1), Université Fédérale Toulouse Midi-Pyrénées-Université Fédérale Toulouse Midi-Pyrénées-Université Toulouse - Jean Jaurès (UT2J)-Université Toulouse III - Paul Sabatier (UT3), Université Fédérale Toulouse Midi-Pyrénées-Centre National de la Recherche Scientifique (CNRS)-Institut National Polytechnique (Toulouse) (Toulouse INP), Université Fédérale Toulouse Midi-Pyrénées-Université Toulouse 1 Capitole (UT1), Université Fédérale Toulouse Midi-Pyrénées, University of Tampere [Finland], Academy of Finland, project no. 290190
Jazyk: angličtina
Rok vydání: 2020
Předmět:
FOS: Computer and information sciences
Sound (cs.SD)
Computer science
Initialization
02 engineering and technology
Computer Science - Sound
symbols.namesake
[INFO.INFO-TS]Computer Science [cs]/Signal and Image Processing
Audio and Speech Processing (eess.AS)
FOS: Electrical engineering
electronic engineering
information engineering

0202 electrical engineering
electronic engineering
information engineering

Source separation
online spectrogram inversion
Electrical and Electronic Engineering
sinusoidal modeling
Audio signal
Applied Mathematics
phase recovery
Short-time Fourier transform
audio source separation
020206 networking & telecommunications
Inversion (meteorology)
Multiple input
Fourier transform
Signal Processing
[INFO.INFO-SD]Computer Science [cs]/Sound [cs.SD]
symbols
Spectrogram
low-latency
Algorithm
Electrical Engineering and Systems Science - Audio and Speech Processing
Zdroj: IEEE Signal Processing Letters
IEEE Signal Processing Letters, Institute of Electrical and Electronics Engineers, 2020, 27, pp.306-310. ⟨10.1109/LSP.2020.2970310⟩
ISSN: 1070-9908
DOI: 10.1109/LSP.2020.2970310⟩
Popis: International audience; Audio source separation is usually achieved by estimating the short-time Fourier transform (STFT) magnitude of each source, and then applying a spectrogram inversion algorithm to retrieve time-domain signals. In particular, the multiple input spectrogram inversion (MISI) algorithm has been exploited successfully in several recent works. However, this algorithm suffers from two drawbacks, which we address in this paper. First, it has originally been introduced in a heuristic fashion: we propose here a rigorous optimization framework in which MISI is derived, thus proving the convergence of this algorithm. Besides, while MISI operates offline, we propose here an online version of MISI called oMISI, which is suitable for low-latency source separation, an important requirement for e.g., hearing aids applications. oMISI also allows one to use alternative phase initialization schemes exploiting the temporal structure of audio signals. Experiments conducted on a speech separation task show that oMISI performs as well as its offline counterpart, thus demonstrating its potential for real-time source separation.
Databáze: OpenAIRE