Online Spectrogram Inversion for Low-Latency Audio Source Separation
Autor: | Paul Magron, Tuomas Virtanen |
---|---|
Přispěvatelé: | Signal et Communications (IRIT-SC), Institut de recherche en informatique de Toulouse (IRIT), Université Toulouse 1 Capitole (UT1), Université Fédérale Toulouse Midi-Pyrénées-Université Fédérale Toulouse Midi-Pyrénées-Université Toulouse - Jean Jaurès (UT2J)-Université Toulouse III - Paul Sabatier (UT3), Université Fédérale Toulouse Midi-Pyrénées-Centre National de la Recherche Scientifique (CNRS)-Institut National Polytechnique (Toulouse) (Toulouse INP), Université Fédérale Toulouse Midi-Pyrénées-Université Toulouse 1 Capitole (UT1), Université Fédérale Toulouse Midi-Pyrénées, University of Tampere [Finland], Academy of Finland, project no. 290190 |
Jazyk: | angličtina |
Rok vydání: | 2020 |
Předmět: |
FOS: Computer and information sciences
Sound (cs.SD) Computer science Initialization 02 engineering and technology Computer Science - Sound symbols.namesake [INFO.INFO-TS]Computer Science [cs]/Signal and Image Processing Audio and Speech Processing (eess.AS) FOS: Electrical engineering electronic engineering information engineering 0202 electrical engineering electronic engineering information engineering Source separation online spectrogram inversion Electrical and Electronic Engineering sinusoidal modeling Audio signal Applied Mathematics phase recovery Short-time Fourier transform audio source separation 020206 networking & telecommunications Inversion (meteorology) Multiple input Fourier transform Signal Processing [INFO.INFO-SD]Computer Science [cs]/Sound [cs.SD] symbols Spectrogram low-latency Algorithm Electrical Engineering and Systems Science - Audio and Speech Processing |
Zdroj: | IEEE Signal Processing Letters IEEE Signal Processing Letters, Institute of Electrical and Electronics Engineers, 2020, 27, pp.306-310. ⟨10.1109/LSP.2020.2970310⟩ |
ISSN: | 1070-9908 |
DOI: | 10.1109/LSP.2020.2970310⟩ |
Popis: | International audience; Audio source separation is usually achieved by estimating the short-time Fourier transform (STFT) magnitude of each source, and then applying a spectrogram inversion algorithm to retrieve time-domain signals. In particular, the multiple input spectrogram inversion (MISI) algorithm has been exploited successfully in several recent works. However, this algorithm suffers from two drawbacks, which we address in this paper. First, it has originally been introduced in a heuristic fashion: we propose here a rigorous optimization framework in which MISI is derived, thus proving the convergence of this algorithm. Besides, while MISI operates offline, we propose here an online version of MISI called oMISI, which is suitable for low-latency source separation, an important requirement for e.g., hearing aids applications. oMISI also allows one to use alternative phase initialization schemes exploiting the temporal structure of audio signals. Experiments conducted on a speech separation task show that oMISI performs as well as its offline counterpart, thus demonstrating its potential for real-time source separation. |
Databáze: | OpenAIRE |
Externí odkaz: |