Sound indexing using morphological description

Autor:	Emmanuel Deruty, Geoffroy Peeters
Přispěvatelé:	Analyse et synthèse sonores [Paris], Sciences et Technologies de la Musique et du Son (STMS), Institut de Recherche et Coordination Acoustique/Musique (IRCAM)-Université Pierre et Marie Curie - Paris 6 (UPMC)-Centre National de la Recherche Scientifique (CNRS)-Institut de Recherche et Coordination Acoustique/Musique (IRCAM)-Université Pierre et Marie Curie - Paris 6 (UPMC)-Centre National de la Recherche Scientifique (CNRS), ircam, ircam
Jazyk:	francouzština
Rok vydání:	2010
Předmět:	Acoustics and Ultrasonics Computer science Speech recognition Feature extraction computer.software_genre 050105 experimental psychology Loudness 030507 speech-language pathology & audiology 03 medical and health sciences 0501 psychology and cognitive sciences Electrical and Electronic Engineering Audio signal processing [SPI.SIGNAL] Engineering Sciences [physics]/Signal and Image processing [SPI.ACOU]Engineering Sciences [physics]/Acoustics [physics.class-ph] [SPI.ACOU] Engineering Sciences [physics]/Acoustics [physics.class-ph] Audio signal [SCCO.NEUR]Cognitive science/Neuroscience 05 social sciences [SCCO.NEUR] Cognitive science/Neuroscience Filter (signal processing) Similitude Computer Science::Sound Automatic indexing NA Mel-frequency cepstrum 0305 other medical science computer [SPI.SIGNAL]Engineering Sciences [physics]/Signal and Image processing
Zdroj:	IEEE Transactions on Audio, Speech and Language Processing IEEE Transactions on Audio, Speech and Language Processing, Institute of Electrical and Electronics Engineers, 2010 IEEE Transactions on Audio, Speech and Language Processing, 2010
ISSN:	1558-7916
Popis:	Sound sample indexing usually deals with the recognition of the source/cause that has produced the sound. For abstract sounds, sound effects, unnatural, or synthetic sounds, this cause is usually unknown or unrecognizable. An efficient description of these sounds has been proposed by Schaeffer under the name morphological description. Part of this description consists in describing a sound by identifying the temporal evolution of its acoustic properties to a set of profiles. In this paper, we consider three morphological descriptions: dynamic profiles (ascending, descending, ascending/descending, stable, impulsive), melodic profiles (up, down, stable, up/down, down/up) and complex-iterative sound description (non-iterative, iterative, grain, repetition). We study the automatic indexing of a sound into these profiles. Because this automatic indexing is difficult using standard audio features, we propose new audio features to perform this task. The dynamic profiles are estimated by modeling the loudness over-time of a sound by a second-order B-spline model and derive features from this model. The melodic profiles are estimated by tracking over time the perceptual filter which has the maximum excitation. A function is derived from this track which is then modeled using a second-order B-spline model. The features are again derived from the B-spline model. The description of complex-iterative sounds is obtained by estimating the amount of repetition and the period of the repetition. These are obtained by computing an audio similarity function derived from an Mel frequency cepstral coefficients (MFCC) similarity matrix. The proposed audio features are then tested for automatic classification. We consider three classification tasks corresponding to the three profiles. In each case, the results are compared with the ones obtained using standard audio features.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::cb8c413cb30714dd58a3467822000f2f https://hal.archives-ouvertes.fr/hal-01106507 Zobrazit plný text záznamu