End-To-End Melody Note Transcription Based on a Beat-Synchronous Attention Mechanism

Autor:	Kazuyoshi Yoshii, Masataka Goto, Eita Nakamura, Ryo Nishikimi
Rok vydání:	2019
Předmět:	Audio signal Transcription (music) Computer science Speech recognition 010501 environmental sciences 01 natural sciences 030507 speech-language pathology & audiology 03 medical and health sciences Recurrent neural network End-to-end principle Note value Spectrogram Singing 0305 other medical science Encoder 0105 earth and related environmental sciences
Zdroj:	WASPAA
DOI:	10.1109/waspaa.2019.8937207
Popis:	This paper describes an end-to-end audio-to-symbolic singing transcription method for mixtures of vocal and accompaniment parts. Given audio signals with non-aligned melody scores, we aim to train a recurrent neural network that takes as input a magnitude spectrogram and outputs a sequence of melody notes represented by pairs of pitches and note values (durations). A promising approach to such sequence-to-sequence learning (joint input-to-output alignment and mapping) is to use an encoder-decoder model with an attention mechanism. This approach, however, cannot be used straightforwardly for singing transcription because a note-level decoder fails to estimate note values from latent representations obtained by a frame-level encoder that is good at extracting instantaneous features, but poor at extracting temporal features. To solve this problem, we focus on tatums instead of notes as output units and propose a tatum-level decoder that sequentially outputs tatum-level score segments represented by note pitches, note onset frags, and beat and downbeat flags. We then propose a beat-synchronous attention mechanism constrained in order to monotonically align tatum-level scores with input audio signals with a steady increment. The experimental results showed that the proposed method can be trained successfully from non-aligned data thanks to the beat-synchronous attention mechanism.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::0082a05c9a13fe747ef0f7010547bb33 https://doi.org/10.1109/waspaa.2019.8937207 Zobrazit plný text záznamu