Discriminative frequency filter banks learning with neural networks

Autor: Teng Zhang, Ji Wu
Jazyk: angličtina
Rok vydání: 2019
Předmět:
Zdroj: EURASIP Journal on Audio, Speech, and Music Processing, Vol 2019, Iss 1, Pp 1-16 (2019)
Druh dokumentu: article
ISSN: 1687-4722
DOI: 10.1186/s13636-018-0144-6
Popis: Abstract Filter banks on spectrums play an important role in many audio applications. Traditionally, the filters are linearly distributed on perceptual frequency scale such as Mel scale. To make the output smoother, these filters are often placed so that they overlap with each other. However, fixed-parameter filters are usually in the context of psychoacoustic experiments and selected experimentally. To make filter banks discriminative, the authors use a neural network structure to learn the frequency center, bandwidth, gain, and shape of the filters adaptively when filter banks are used as a feature extractor. This paper investigates several different constraints on discriminative frequency filter banks and the dual spectrum reconstruction problem. Experiments on audio source separation and audio scene classification tasks show performance improvements of the proposed filter banks when compared with traditional fixed-parameter triangular or gaussian filters on Mel scale. The classification errors on LITIS ROUEN dataset and DCASE2016 dataset are reduced by 13.9% and 4.6% relatively.
Databáze: Directory of Open Access Journals