Source/Filter Model for Unsupervised Main Melody Extraction From Polyphonic Audio Signals

Autor:	Cédric Févotte, Gael Richard, Jean-Louis Durrieu, Bertrand David
Přispěvatelé:	Laboratoire Traitement et Communication de l'Information (LTCI), Télécom ParisTech-Institut Mines-Télécom [Paris] (IMT)-Centre National de la Recherche Scientifique (CNRS), Signal, Statistique et Apprentissage (S2A), Institut Mines-Télécom [Paris] (IMT)-Télécom Paris-Institut Mines-Télécom [Paris] (IMT)-Télécom Paris, Département Images, Données, Signal (IDS), Télécom ParisTech, Department of Engineering [Cambridge], University of Cambridge [UK] (CAM), RICHARD, Gaël
Jazyk:	angličtina
Rok vydání:	2010
Předmět:	Blind Audio Source Separation Acoustics and Ultrasonics Computer science Speech recognition 02 engineering and technology Source/Filter Model computer.software_genre Blind signal separation Mixture theory 030507 speech-language pathology & audiology 03 medical and health sciences 0202 electrical engineering electronic engineering information engineering Source separation Non-negative Matrix Factorization (NMF) Electrical and Electronic Engineering Audio signal processing [SPI.SIGNAL] Engineering Sciences [physics]/Signal and Image processing Audio signal Index Terms-Music business.industry Statistical model Pattern recognition Mixture model Main Melody Ex- traction Gaussian Scaled Mixture Model (GSMM) Spectral Analysis Information extraction Computer Science::Sound Max- imum Likelihood 020201 artificial intelligence & image processing Artificial intelligence 0305 other medical science business computer Expectation-Maximization (EM) algorithm [SPI.SIGNAL]Engineering Sciences [physics]/Signal and Image processing
Zdroj:	IEEE Transactions on Audio, Speech and Language Processing IEEE Transactions on Audio, Speech and Language Processing, Institute of Electrical and Electronics Engineers, 2010
ISSN:	1558-7916
Popis:	International audience; Extracting the main melody from a polyphonic music recording seems natural even to untrained human listeners. To a certain extent it is related to the concept of source separation, with the human ability of focusing on a specific source in order to extract relevant information. In this article, we propose a new approach for the estimation and extraction of the main melody (and in particular the leading vocal part) from polyphonic audio signals. To that aim, we propose a new signal model where the leading vocal part is explicitly represented by a specific source/filter model. The proposed representation is investigated in the framework of two statistical models: a Gaussian Scaled Mixture Model (GSMM) and an extended Instantaneous Mixture Model (IMM). For both models, the estimation of the different parameters is done within a maximum likelihood framework adapted from single-channel source separation techniques. The desired sequence of fundamental frequencies is then inferred from the estimated parameters. The results obtained in a recent evaluation campaign (MIREX08) show that the proposed approaches are very promising and reach state-of-the-art performances on all test sets.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::3f36229b78e974f80c099f0cab211ea4 https://hal.archives-ouvertes.fr/hal-02652995/file/TSALP_Durrieu10.pdf Zobrazit plný text záznamu