Sound indexing using morphological description

Autor: Emmanuel Deruty, Geoffroy Peeters
Přispěvatelé: Analyse et synthèse sonores [Paris], Sciences et Technologies de la Musique et du Son (STMS), Institut de Recherche et Coordination Acoustique/Musique (IRCAM)-Université Pierre et Marie Curie - Paris 6 (UPMC)-Centre National de la Recherche Scientifique (CNRS)-Institut de Recherche et Coordination Acoustique/Musique (IRCAM)-Université Pierre et Marie Curie - Paris 6 (UPMC)-Centre National de la Recherche Scientifique (CNRS), ircam, ircam
Jazyk: francouzština
Rok vydání: 2010
Předmět:
Acoustics and Ultrasonics
Computer science
Speech recognition
Feature extraction
computer.software_genre
050105 experimental psychology
Loudness
030507 speech-language pathology & audiology
03 medical and health sciences
0501 psychology and cognitive sciences
Electrical and Electronic Engineering
Audio signal processing
[SPI.SIGNAL] Engineering Sciences [physics]/Signal and Image processing
[SPI.ACOU]Engineering Sciences [physics]/Acoustics [physics.class-ph]
[SPI.ACOU] Engineering Sciences [physics]/Acoustics [physics.class-ph]
Audio signal
[SCCO.NEUR]Cognitive science/Neuroscience
05 social sciences
[SCCO.NEUR] Cognitive science/Neuroscience
Filter (signal processing)
Similitude
Computer Science::Sound
Automatic indexing
NA
Mel-frequency cepstrum
0305 other medical science
computer
[SPI.SIGNAL]Engineering Sciences [physics]/Signal and Image processing
Zdroj: IEEE Transactions on Audio, Speech and Language Processing
IEEE Transactions on Audio, Speech and Language Processing, Institute of Electrical and Electronics Engineers, 2010
IEEE Transactions on Audio, Speech and Language Processing, 2010
ISSN: 1558-7916
Popis: Sound sample indexing usually deals with the recognition of the source/cause that has produced the sound. For abstract sounds, sound effects, unnatural, or synthetic sounds, this cause is usually unknown or unrecognizable. An efficient description of these sounds has been proposed by Schaeffer under the name morphological description. Part of this description consists in describing a sound by identifying the temporal evolution of its acoustic properties to a set of profiles. In this paper, we consider three morphological descriptions: dynamic profiles (ascending, descending, ascending/descending, stable, impulsive), melodic profiles (up, down, stable, up/down, down/up) and complex-iterative sound description (non-iterative, iterative, grain, repetition). We study the automatic indexing of a sound into these profiles. Because this automatic indexing is difficult using standard audio features, we propose new audio features to perform this task. The dynamic profiles are estimated by modeling the loudness over-time of a sound by a second-order B-spline model and derive features from this model. The melodic profiles are estimated by tracking over time the perceptual filter which has the maximum excitation. A function is derived from this track which is then modeled using a second-order B-spline model. The features are again derived from the B-spline model. The description of complex-iterative sounds is obtained by estimating the amount of repetition and the period of the repetition. These are obtained by computing an audio similarity function derived from an Mel frequency cepstral coefficients (MFCC) similarity matrix. The proposed audio features are then tested for automatic classification. We consider three classification tasks corresponding to the three profiles. In each case, the results are compared with the ones obtained using standard audio features.
Databáze: OpenAIRE