Pop Music Highlighter: Marking the Emotion Keypoints

Autor: Yi-Hsuan Yang, Szu-Yu Chou, Yu-Siang Huang
Jazyk: angličtina
Rok vydání: 2018
Předmět:
FOS: Computer and information sciences
lcsh:M1-5000
Sound (cs.SD)
Computer Science - Artificial Intelligence
Computer science
Emotion classification
convolutional neural network
02 engineering and technology
computer.software_genre
Convolutional neural network
highlight extraction
Computer Science - Sound
Popular music
Audio and Speech Processing (eess.AS)
FOS: Electrical engineering
electronic engineering
information engineering

0202 electrical engineering
electronic engineering
information engineering

Feature (machine learning)
Set (psychology)
Music thumbnailing
lcsh:Music
lcsh:T58.5-58.64
Heuristic
business.industry
lcsh:Information technology
structure analysis
Multimedia (cs.MM)
Artificial Intelligence (cs.AI)
Recurrent neural network
020201 artificial intelligence & image processing
Repetition (music)
Artificial intelligence
business
chorus detection
attention mechanism
computer
Computer Science - Multimedia
Natural language processing
Electrical Engineering and Systems Science - Audio and Speech Processing
Zdroj: Transactions of the International Society for Music Information Retrieval, Vol 1, Iss 1, Pp 68-78 (2018)
ISSN: 2514-3298
Popis: The goal of music highlight extraction is to get a short consecutive segment of a piece of music that provides an effective representation of the whole piece. In a previous work, we introduced an attention-based convolutional recurrent neural network that uses music emotion classification as a surrogate task for music highlight extraction, for Pop songs. The rationale behind that approach is that the highlight of a song is usually the most emotional part. This paper extends our previous work in the following two aspects. First, methodology-wise we experiment with a new architecture that does not need any recurrent layers, making the training process faster. Moreover, we compare a late-fusion variant and an early-fusion variant to study which one better exploits the attention mechanism. Second, we conduct and report an extensive set of experiments comparing the proposed attention-based methods against a heuristic energy-based method, a structural repetition-based method, and a few other simple feature-based methods for this task. Due to the lack of public-domain labeled data for highlight extraction, following our previous work we use the RWC POP 100-song data set to evaluate how the detected highlights overlap with any chorus sections of the songs. The experiments demonstrate the effectiveness of our methods over competing methods. For reproducibility, we open source the code and pre-trained model at https://github.com/remyhuang/pop-music-highlighter/.
Comment: Transactions of the ISMIR vol. 1, no. 1
Databáze: OpenAIRE