Algorithms for audio inpainting based on probabilistic nonnegative matrix factorization

Autor:	Ondřej Mokrý, Paul Magron, Thomas Oberlin, Cédric Févotte
Přispěvatelé:	Brno University of Technology [Brno] (BUT), Speech Modeling for Facilitating Oral-Based Communication (MULTISPEECH), Inria Nancy - Grand Est, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Department of Natural Language Processing & Knowledge Discovery (LORIA - NLPKD), Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS), Institut Supérieur de l'Aéronautique et de l'Espace (ISAE-SUPAERO), Signal et Communications (IRIT-SC), Institut de recherche en informatique de Toulouse (IRIT), Université Toulouse Capitole (UT Capitole), Université de Toulouse (UT)-Université de Toulouse (UT)-Université Toulouse - Jean Jaurès (UT2J), Université de Toulouse (UT)-Université Toulouse III - Paul Sabatier (UT3), Université de Toulouse (UT)-Centre National de la Recherche Scientifique (CNRS)-Institut National Polytechnique (Toulouse) (Toulouse INP), Université de Toulouse (UT)-Toulouse Mind & Brain Institut (TMBI), Université Toulouse - Jean Jaurès (UT2J), Université de Toulouse (UT)-Université de Toulouse (UT)-Université Toulouse III - Paul Sabatier (UT3), Université de Toulouse (UT)-Université Toulouse Capitole (UT Capitole), Université de Toulouse (UT), Centre National de la Recherche Scientifique (CNRS), Czech Science Foundation (GA ˇCR) Project No. 20-29009S, ANR-19-P3IA-0004,ANITI,Artificial and Natural Intelligence Toulouse Institute(2019), European Project: CoG-6681839,ERC FACTORY, Université Toulouse 1 Capitole (UT1), Université Fédérale Toulouse Midi-Pyrénées-Université Fédérale Toulouse Midi-Pyrénées-Université Toulouse - Jean Jaurès (UT2J)-Université Toulouse III - Paul Sabatier (UT3), Université Fédérale Toulouse Midi-Pyrénées-Centre National de la Recherche Scientifique (CNRS)-Institut National Polytechnique (Toulouse) (Toulouse INP), Université Fédérale Toulouse Midi-Pyrénées-Université Toulouse 1 Capitole (UT1), Université Fédérale Toulouse Midi-Pyrénées, Magron, Paul, Artificial and Natural Intelligence Toulouse Institute - - ANITI2019 - ANR-19-P3IA-0004 - P3IA - VALID, European Research Council (ERC FACTORY-CoG-6681839) - ERC FACTORY - CoG-6681839 - INCOMING
Jazyk:	angličtina
Rok vydání:	2023
Předmět:	FOS: Computer and information sciences Sound (cs.SD) [INFO.INFO-TS] Computer Science [cs]/Signal and Image Processing audio inpainting nonnegative matrix factorization Computer Science - Sound expectation-maximization [INFO.INFO-TS]Computer Science [cs]/Signal and Image Processing Audio and Speech Processing (eess.AS) Computer Science::Sound Control and Systems Engineering alternating minimization Signal Processing FOS: Electrical engineering electronic engineering information engineering Computer Vision and Pattern Recognition Electrical and Electronic Engineering Software Electrical Engineering and Systems Science - Audio and Speech Processing
Zdroj:	Signal Processing Signal Processing, 2023
ISSN:	0165-1684 1872-7557
Popis:	International audience; Audio inpainting, i.e., the task of restoring missing or occluded audio signal samples, usually relies on sparse representations or autoregressive modeling. In this paper, we propose to structure the spectrogram with nonnegative matrix factorization (NMF) in a probabilistic framework. First, we treat the missing samples as latent variables, and derive two expectation-maximization algorithms for estimating the parameters of the model, depending on whether we formulate the problem in the time-or time-frequency domain. Then, we treat the missing samples as parameters, and we address this novel problem by deriving an alternating minimization scheme. We assess the potential of these algorithms for the task of restoring short-to middle-length gaps in music signals. Experiments reveal great convergence properties of the proposed methods, as well as competitive performance when compared to state-of-the-art audio inpainting techniques.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::abdd17cbe00e59cc0ffc3d447feea709 https://inria.hal.science/hal-03708613v2/document Zobrazit plný text záznamu Full Text from ScienceDirect