Phase recovery with Bregman divergences for audio source separation
Autor: | Pierre-Hugo Vial, Cédric Févotte, Paul Magron, Thomas Oberlin |
---|---|
Přispěvatelé: | Signal et Communications (IRIT-SC), Institut de recherche en informatique de Toulouse (IRIT), Université Toulouse Capitole (UT Capitole), Université de Toulouse (UT)-Université de Toulouse (UT)-Université Toulouse - Jean Jaurès (UT2J), Université de Toulouse (UT)-Université Toulouse III - Paul Sabatier (UT3), Université de Toulouse (UT)-Centre National de la Recherche Scientifique (CNRS)-Institut National Polytechnique (Toulouse) (Toulouse INP), Université de Toulouse (UT)-Toulouse Mind & Brain Institut (TMBI), Université Toulouse - Jean Jaurès (UT2J), Université de Toulouse (UT)-Université de Toulouse (UT)-Université Toulouse III - Paul Sabatier (UT3), Université de Toulouse (UT)-Université Toulouse Capitole (UT Capitole), Université de Toulouse (UT), Institut Supérieur de l'Aéronautique et de l'Espace (ISAE-SUPAERO), Centre National de la Recherche Scientifique (CNRS), ANR-19-P3IA-0004,ANITI,Artificial and Natural Intelligence Toulouse Institute(2019), European Project: CoG-6681839,ERC FACTORY, Université Toulouse 1 Capitole (UT1)-Université Toulouse - Jean Jaurès (UT2J)-Université Toulouse III - Paul Sabatier (UT3), Université Fédérale Toulouse Midi-Pyrénées-Université Fédérale Toulouse Midi-Pyrénées-Centre National de la Recherche Scientifique (CNRS)-Institut National Polytechnique (Toulouse) (Toulouse INP), Université Fédérale Toulouse Midi-Pyrénées-Université Toulouse 1 Capitole (UT1)-Université Toulouse - Jean Jaurès (UT2J)-Université Toulouse III - Paul Sabatier (UT3), Université Fédérale Toulouse Midi-Pyrénées, Université Toulouse 1 Capitole (UT1), Université Fédérale Toulouse Midi-Pyrénées-Université Fédérale Toulouse Midi-Pyrénées-Université Toulouse - Jean Jaurès (UT2J)-Université Toulouse III - Paul Sabatier (UT3), Université Fédérale Toulouse Midi-Pyrénées-Centre National de la Recherche Scientifique (CNRS)-Institut National Polytechnique (Toulouse) (Toulouse INP), Université Fédérale Toulouse Midi-Pyrénées-Université Toulouse 1 Capitole (UT1), Centre National de la Recherche Scientifique - CNRS (FRANCE), Institut National Polytechnique de Toulouse - Toulouse INP (FRANCE), Institut Supérieur de l'Aéronautique et de l'Espace - ISAE-SUPAERO (FRANCE), Université Toulouse III - Paul Sabatier - UT3 (FRANCE), Université Toulouse - Jean Jaurès - UT2J (FRANCE), Université Toulouse 1 Capitole - UT1 (FRANCE) |
Jazyk: | angličtina |
Rok vydání: | 2021 |
Předmět: |
FOS: Computer and information sciences
Sound (cs.SD) Computer science Audio source separation Speech enhancement projected gradient descent Computer Science - Sound Phase recovery symbols.namesake Quadratic equation [INFO.INFO-TS]Computer Science [cs]/Signal and Image Processing Audio and Speech Processing (eess.AS) FOS: Electrical engineering electronic engineering information engineering Source separation Projected gradient descent Short-time Fourier transform audio source separation Informatique et langage Bregman divergences Time–frequency analysis Fourier transform Computer Science::Sound [INFO.INFO-SD]Computer Science [cs]/Sound [cs.SD] symbols Spectrogram speech enhancement Gradient descent Algorithm Electrical Engineering and Systems Science - Audio and Speech Processing |
Zdroj: | IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Jun 2021, Toronto, Canada ICASSP |
Popis: | International audience; Time-frequency audio source separation is usually achieved by estimating the short-time Fourier transform (STFT) magnitude of each source, and then applying a phase recovery algorithm to retrieve time-domain signals. In particular, the multiple input spectrogram inversion (MISI) algorithm has shown good performance in several recent works. This algorithm minimizes a quadratic reconstruction error between magnitude spectrograms. However, this loss does not properly account for some perceptual properties of audio, and alternative discrepancy measures such as beta-divergences have been preferred in many settings. In this paper, we propose to reformulate phase recovery in audio source separation as a minimization problem involving Bregman divergences. To optimize the resulting objective, we derive a projected gradient descent algorithm. Experiments conducted on a speech enhancement task show that this approach outperforms MISI for several alternative losses, which highlights their relevance for audio source separation applications. |
Databáze: | OpenAIRE |
Externí odkaz: |