Spectro-Temporal Gabor Filterbank Features for Acoustic Event Detection
Autor: | Stefan Goetze, Jörn Anemüller, Jens Schroder |
---|---|
Přispěvatelé: | Publica |
Rok vydání: | 2015 |
Předmět: |
Acoustics and Ultrasonics
Computer science business.industry Speech recognition Gabor wavelet Feature extraction Pattern recognition Filter bank Computational Mathematics Statistical classification Gabor filter Cepstrum Computer Science (miscellaneous) Mel-frequency cepstrum Artificial intelligence Electrical and Electronic Engineering Hidden Markov model business |
Zdroj: | IEEE/ACM Transactions on Audio, Speech, and Language Processing. 23:2198-2208 |
ISSN: | 2329-9304 2329-9290 |
DOI: | 10.1109/taslp.2015.2467964 |
Popis: | Algorithms for the automatic detection and recognition of acoustic events are increasingly gaining relevance for the reliable and robust functioning of consumer, assistive and monitoring systems. The extraction of appropriate task relevant acoustic features from the raw sound signal clearly influences performance of subsequent statistical classification, in particular in adverse acoustic situations. The present contribution investigates the use of biologically-inspired features, derived from a filterbank of two-dimensional Gabor functions, that decompose the spectro-temporal power density into components which capture spectral, temporal and joint spectro-temporal modulation patterns. It is hypothesized that the comparably large joint spectral and temporal extent of these Gabor functions results in features that allow for robust classification. Evaluation of the proposed feature extraction scheme together with an hidden Markov model (HMM) classifier is conducted on two corpora comprising acoustic events in realistic adverse conditions from the D-CASE and CLEAR'07 evaluation campaigns. Relevance of each Gabor filter for classification is analyzed and an optimized parameter set for the Gabor filterbank (GFB) is identified. Performance of the optimized GFB is evaluated in comparison to other state-of-the-art algorithms on isolated event classification and on the full acoustic event detection (AED) including joint classification and temporal segmentation of events. Results show that Gabor features result in a signal representation that exhibits separated average class-specific patterns. An improvement in classification accuracy of up to 26% relative to the Mel-frequency cepstral coefficient (MFCC) baseline is obtained with the optimized GFB. Further experiments demonstrate that this improvement cannot be explained by purely temporal or purely spectral Gabor basis functions. Rather, a GFB with features extending in joint spectro-temporal directions is required to obtain- optimum performance. Performance on AED with the D-CASE challenge dataset is shown to improve on previous algorithms from the recent literature. |
Databáze: | OpenAIRE |
Externí odkaz: |