Multichannel nonnegative tensor factorization with structured constraints for user-guided audio source separation

Autor: Alexey Ozerov, Cédric Févotte, Raphaël Blouet, Jean-Louis Durrieu
Přispěvatelé: Speech and sound data modeling and processing (METISS), Institut de Recherche en Informatique et Systèmes Aléatoires (IRISA), Université de Rennes 1 (UR1), Université de Rennes (UNIV-RENNES)-Université de Rennes (UNIV-RENNES)-Institut National des Sciences Appliquées - Rennes (INSA Rennes), Institut National des Sciences Appliquées (INSA)-Université de Rennes (UNIV-RENNES)-Institut National des Sciences Appliquées (INSA)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université de Rennes 1 (UR1), Institut National des Sciences Appliquées (INSA)-Université de Rennes (UNIV-RENNES)-Institut National des Sciences Appliquées (INSA)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Inria Rennes – Bretagne Atlantique, Institut National de Recherche en Informatique et en Automatique (Inria), Laboratoire Traitement et Communication de l'Information (LTCI), Télécom ParisTech-Institut Mines-Télécom [Paris] (IMT)-Centre National de la Recherche Scientifique (CNRS), Yacast, Laboratoire de Traitement du signal [EPFL] / Signal Processing Laboratories (SP Lab), Ecole Polytechnique Fédérale de Lausanne (EPFL), Quaero program funded by OSEO, the French State agency for innovation, ANR-06-RIAM-0024,SARAH,StAndardisation du Remastering Audio Haute-définition(2006), ANR-09-JCJC-0073,TANGERINE(2009), Université de Rennes (UR)-Institut National des Sciences Appliquées - Rennes (INSA Rennes), Institut National des Sciences Appliquées (INSA)-Institut National des Sciences Appliquées (INSA)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université de Rennes (UR)-Institut National des Sciences Appliquées - Rennes (INSA Rennes), Institut National des Sciences Appliquées (INSA)-Institut National des Sciences Appliquées (INSA)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Inria Rennes – Bretagne Atlantique, Ozerov, Alexey, Programme Audiovisuel et Multimédia - StAndardisation du Remastering Audio Haute-définition - - SARAH2006 - ANR-06-RIAM-0024 - RIAM - VALID, Jeunes chercheuses et jeunes chercheurs - - TANGERINE2009 - ANR-09-JCJC-0073 - JCJC - VALID
Jazyk: angličtina
Rok vydání: 2011
Předmět:
Computer science
[INFO.INFO-TS] Computer Science [cs]/Signal and Image Processing
Speech recognition
02 engineering and technology
computer.software_genre
Convolution
030507 speech-language pathology & audiology
03 medical and health sciences
[INFO.INFO-TS]Computer Science [cs]/Signal and Image Processing
0202 electrical engineering
electronic engineering
information engineering

Source separation
Tensor
Audio signal processing
[SPI.SIGNAL] Engineering Sciences [physics]/Signal and Image processing
business.industry
SIGNAL (programming language)
020206 networking & telecommunications
Pattern recognition
Time–frequency analysis
Spectrogram
Noise (video)
Artificial intelligence
0305 other medical science
business
computer
[SPI.SIGNAL]Engineering Sciences [physics]/Signal and Image processing
Zdroj: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP'11)
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP'11), May 2011, Prague, Czech Republic
ICASSP
Popis: International audience; Separating multiple tracks from professionally produced music recordings (PPMRs) is still a challenging problem. We address this task with a user-guided approach in which the separation system is provided segmental information indicating the time activations of the particular instruments to separate. This information may typically be retrieved from manual annotation. We use a so-called multichannel nonnegative tensor factorization (NTF) model, in which the original sources are observed through a multichannel convolutive mixture and in which the source power spectrograms are jointly modeled by a 3-valence (time/frequency/source) tensor. Our user-guided separation method produced competitive results at the 2010 Signal Separation Evaluation Campaign, with sufficient quality for real-world music editing applications.
Databáze: OpenAIRE