Source/Filter Model for Unsupervised Main Melody Extraction From Polyphonic Audio Signals

Autor: Cédric Févotte, Gael Richard, Jean-Louis Durrieu, Bertrand David
Přispěvatelé: Laboratoire Traitement et Communication de l'Information (LTCI), Télécom ParisTech-Institut Mines-Télécom [Paris] (IMT)-Centre National de la Recherche Scientifique (CNRS), Signal, Statistique et Apprentissage (S2A), Institut Mines-Télécom [Paris] (IMT)-Télécom Paris-Institut Mines-Télécom [Paris] (IMT)-Télécom Paris, Département Images, Données, Signal (IDS), Télécom ParisTech, Department of Engineering [Cambridge], University of Cambridge [UK] (CAM), RICHARD, Gaël
Jazyk: angličtina
Rok vydání: 2010
Předmět:
Blind Audio Source Separation
Acoustics and Ultrasonics
Computer science
Speech recognition
02 engineering and technology
Source/Filter Model
computer.software_genre
Blind signal separation
Mixture theory
030507 speech-language pathology & audiology
03 medical and health sciences
0202 electrical engineering
electronic engineering
information engineering

Source separation
Non-negative Matrix Factorization (NMF)
Electrical and Electronic Engineering
Audio signal processing
[SPI.SIGNAL] Engineering Sciences [physics]/Signal and Image processing
Audio signal
Index Terms-Music
business.industry
Statistical model
Pattern recognition
Mixture model
Main Melody Ex- traction
Gaussian Scaled Mixture Model (GSMM)
Spectral Analysis
Information extraction
Computer Science::Sound
Max- imum Likelihood
020201 artificial intelligence & image processing
Artificial intelligence
0305 other medical science
business
computer
Expectation-Maximization (EM) algorithm
[SPI.SIGNAL]Engineering Sciences [physics]/Signal and Image processing
Zdroj: IEEE Transactions on Audio, Speech and Language Processing
IEEE Transactions on Audio, Speech and Language Processing, Institute of Electrical and Electronics Engineers, 2010
ISSN: 1558-7916
Popis: International audience; Extracting the main melody from a polyphonic music recording seems natural even to untrained human listeners. To a certain extent it is related to the concept of source separation, with the human ability of focusing on a specific source in order to extract relevant information. In this article, we propose a new approach for the estimation and extraction of the main melody (and in particular the leading vocal part) from polyphonic audio signals. To that aim, we propose a new signal model where the leading vocal part is explicitly represented by a specific source/filter model. The proposed representation is investigated in the framework of two statistical models: a Gaussian Scaled Mixture Model (GSMM) and an extended Instantaneous Mixture Model (IMM). For both models, the estimation of the different parameters is done within a maximum likelihood framework adapted from single-channel source separation techniques. The desired sequence of fundamental frequencies is then inferred from the estimated parameters. The results obtained in a recent evaluation campaign (MIREX08) show that the proposed approaches are very promising and reach state-of-the-art performances on all test sets.
Databáze: OpenAIRE