Learning to Visualize Music Through Shot Sequence for Automatic Concert Video Mashup

Autor:	Hsin-Min Wang, Tyng-Luh Liu, Hong-Yuan Mark Liao, Jen-Chun Lin, Wen-Li Wei, Hsiao-Rong Tyan
Rok vydání:	2021
Předmět:	Computer science Knowledge engineering 02 engineering and technology computer.software_genre Computer Science Applications Visualization Human–computer interaction ComputerApplications_MISCELLANEOUS Signal Processing 0202 electrical engineering electronic engineering information engineering Media Technology Task analysis 020201 artificial intelligence & image processing Mashup Electrical and Electronic Engineering computer Amateur Storytelling
Zdroj:	IEEE Transactions on Multimedia. 23:1731-1743
ISSN:	1941-0077 1520-9210
DOI:	10.1109/tmm.2020.3003631
Popis:	An experienced director usually switches among different types of shots to make visual storytelling more touching. When filming a musical performance, appropriate switching shots can produce some special effects, such as enhancing the expression of emotion or heating up the atmosphere. However, while the visual storytelling technique is often used in making professional recordings of a live concert, amateur recordings of audiences often lack such storytelling concepts and skills when filming the same event. Thus a versatile system that can perform video mashup to create a refined high-quality video from such amateur clips is desirable. To this end, we aim at translating the music into an attractive shot (type) sequence by learning the relation between music and visual storytelling of shots. The resulting shot sequence can then be used to better portray the visual storytelling of a song and guide the concert video mashup process. To achieve the task, we first introduces a novel probabilistic-based fusion approach, named as multi-resolution fused recurrent neural networks (MF-RNNs) with film-language, which integrates multi-resolution fused RNNs and a film-language model for boosting the translation performance. We then distill the knowledge in MF-RNNs with film-language into a lightweight RNN, which is more efficient and easier to deploy. The results from objective and subjective experiments demonstrate that both MF-RNNs with film-language and lightweight RNN can generate attractive shot sequences for music, thereby enhancing the viewing and listening experience.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::58fff43667df7082cbc97991d17a4039 https://doi.org/10.1109/tmm.2020.3003631 Zobrazit plný text záznamu