Source/Filter Model for Unsupervised Main Melody Extraction From Polyphonic Audio Signals
Autor: | Cédric Févotte, Gael Richard, Jean-Louis Durrieu, Bertrand David |
---|---|
Přispěvatelé: | Laboratoire Traitement et Communication de l'Information (LTCI), Télécom ParisTech-Institut Mines-Télécom [Paris] (IMT)-Centre National de la Recherche Scientifique (CNRS), Signal, Statistique et Apprentissage (S2A), Institut Mines-Télécom [Paris] (IMT)-Télécom Paris-Institut Mines-Télécom [Paris] (IMT)-Télécom Paris, Département Images, Données, Signal (IDS), Télécom ParisTech, Department of Engineering [Cambridge], University of Cambridge [UK] (CAM), RICHARD, Gaël |
Jazyk: | angličtina |
Rok vydání: | 2010 |
Předmět: |
Blind Audio Source Separation
Acoustics and Ultrasonics Computer science Speech recognition 02 engineering and technology Source/Filter Model computer.software_genre Blind signal separation Mixture theory 030507 speech-language pathology & audiology 03 medical and health sciences 0202 electrical engineering electronic engineering information engineering Source separation Non-negative Matrix Factorization (NMF) Electrical and Electronic Engineering Audio signal processing [SPI.SIGNAL] Engineering Sciences [physics]/Signal and Image processing Audio signal Index Terms-Music business.industry Statistical model Pattern recognition Mixture model Main Melody Ex- traction Gaussian Scaled Mixture Model (GSMM) Spectral Analysis Information extraction Computer Science::Sound Max- imum Likelihood 020201 artificial intelligence & image processing Artificial intelligence 0305 other medical science business computer Expectation-Maximization (EM) algorithm [SPI.SIGNAL]Engineering Sciences [physics]/Signal and Image processing |
Zdroj: | IEEE Transactions on Audio, Speech and Language Processing IEEE Transactions on Audio, Speech and Language Processing, Institute of Electrical and Electronics Engineers, 2010 |
ISSN: | 1558-7916 |
Popis: | International audience; Extracting the main melody from a polyphonic music recording seems natural even to untrained human listeners. To a certain extent it is related to the concept of source separation, with the human ability of focusing on a specific source in order to extract relevant information. In this article, we propose a new approach for the estimation and extraction of the main melody (and in particular the leading vocal part) from polyphonic audio signals. To that aim, we propose a new signal model where the leading vocal part is explicitly represented by a specific source/filter model. The proposed representation is investigated in the framework of two statistical models: a Gaussian Scaled Mixture Model (GSMM) and an extended Instantaneous Mixture Model (IMM). For both models, the estimation of the different parameters is done within a maximum likelihood framework adapted from single-channel source separation techniques. The desired sequence of fundamental frequencies is then inferred from the estimated parameters. The results obtained in a recent evaluation campaign (MIREX08) show that the proposed approaches are very promising and reach state-of-the-art performances on all test sets. |
Databáze: | OpenAIRE |
Externí odkaz: |