Learning to Visualize Music Through Shot Sequence for Automatic Concert Video Mashup

Autor: Hsin-Min Wang, Tyng-Luh Liu, Hong-Yuan Mark Liao, Jen-Chun Lin, Wen-Li Wei, Hsiao-Rong Tyan
Rok vydání: 2021
Předmět:
Zdroj: IEEE Transactions on Multimedia. 23:1731-1743
ISSN: 1941-0077
1520-9210
DOI: 10.1109/tmm.2020.3003631
Popis: An experienced director usually switches among different types of shots to make visual storytelling more touching. When filming a musical performance, appropriate switching shots can produce some special effects, such as enhancing the expression of emotion or heating up the atmosphere. However, while the visual storytelling technique is often used in making professional recordings of a live concert, amateur recordings of audiences often lack such storytelling concepts and skills when filming the same event. Thus a versatile system that can perform video mashup to create a refined high-quality video from such amateur clips is desirable. To this end, we aim at translating the music into an attractive shot (type) sequence by learning the relation between music and visual storytelling of shots. The resulting shot sequence can then be used to better portray the visual storytelling of a song and guide the concert video mashup process. To achieve the task, we first introduces a novel probabilistic-based fusion approach, named as multi-resolution fused recurrent neural networks (MF-RNNs) with film-language, which integrates multi-resolution fused RNNs and a film-language model for boosting the translation performance. We then distill the knowledge in MF-RNNs with film-language into a lightweight RNN, which is more efficient and easier to deploy. The results from objective and subjective experiments demonstrate that both MF-RNNs with film-language and lightweight RNN can generate attractive shot sequences for music, thereby enhancing the viewing and listening experience.
Databáze: OpenAIRE