Audio Classification of Bit-Representation Waveform

Autor: Masaki Okawa, Naoki Sawada, Takuya Saito, Hiromitsu Nishizaki
Rok vydání: 2019
Předmět:
FOS: Computer and information sciences
Computer Science - Machine Learning
Sound (cs.SD)
Computer science
Speech recognition
02 engineering and technology
Computer Science - Sound
Machine Learning (cs.LG)
law.invention
Raw audio format
symbols.namesake
Audio and Speech Processing (eess.AS)
law
Computer Science::Multimedia
FOS: Electrical engineering
electronic engineering
information engineering

0202 electrical engineering
electronic engineering
information engineering

Waveform
Representation (mathematics)
Frequency analysis
Computer Science - Computation and Language
Artificial neural network
business.industry
Deep learning
Spectral density
020206 networking & telecommunications
ComputingMethodologies_PATTERNRECOGNITION
Fourier transform
Computer Science::Sound
symbols
020201 artificial intelligence & image processing
Artificial intelligence
business
Computation and Language (cs.CL)
Electrical Engineering and Systems Science - Audio and Speech Processing
Zdroj: INTERSPEECH
DOI: 10.21437/interspeech.2019-1855
Popis: This study investigated the waveform representation for audio signal classification. Recently, many studies on audio waveform classification such as acoustic event detection and music genre classification have been published. Most studies on audio waveform classification have proposed the use of a deep learning (neural network) framework. Generally, a frequency analysis method such as Fourier transform is applied to extract the frequency or spectral information from the input audio waveform before inputting the raw audio waveform into the neural network. In contrast to these previous studies, in this paper, we propose a novel waveform representation method, in which audio waveforms are represented as a bit sequence, for audio classification. In our experiment, we compare the proposed bit representation waveform, which is directly given to a neural network, to other representations of audio waveforms such as a raw audio waveform and a power spectrum with two classification tasks: one is an acoustic event classification task and the other is a sound/music classification task. The experimental results showed that the bit representation waveform achieved the best classification performance for both the tasks.
Comment: Accepted at INTERSPEECH2019
Databáze: OpenAIRE