Learning to Visualize Music Through Shot Sequence for Automatic Concert Video Mashup
Autor: | Hsin-Min Wang, Tyng-Luh Liu, Hong-Yuan Mark Liao, Jen-Chun Lin, Wen-Li Wei, Hsiao-Rong Tyan |
---|---|
Rok vydání: | 2021 |
Předmět: |
Computer science
Knowledge engineering 02 engineering and technology computer.software_genre Computer Science Applications Visualization Human–computer interaction ComputerApplications_MISCELLANEOUS Signal Processing 0202 electrical engineering electronic engineering information engineering Media Technology Task analysis 020201 artificial intelligence & image processing Mashup Electrical and Electronic Engineering computer Amateur Storytelling |
Zdroj: | IEEE Transactions on Multimedia. 23:1731-1743 |
ISSN: | 1941-0077 1520-9210 |
DOI: | 10.1109/tmm.2020.3003631 |
Popis: | An experienced director usually switches among different types of shots to make visual storytelling more touching. When filming a musical performance, appropriate switching shots can produce some special effects, such as enhancing the expression of emotion or heating up the atmosphere. However, while the visual storytelling technique is often used in making professional recordings of a live concert, amateur recordings of audiences often lack such storytelling concepts and skills when filming the same event. Thus a versatile system that can perform video mashup to create a refined high-quality video from such amateur clips is desirable. To this end, we aim at translating the music into an attractive shot (type) sequence by learning the relation between music and visual storytelling of shots. The resulting shot sequence can then be used to better portray the visual storytelling of a song and guide the concert video mashup process. To achieve the task, we first introduces a novel probabilistic-based fusion approach, named as multi-resolution fused recurrent neural networks (MF-RNNs) with film-language, which integrates multi-resolution fused RNNs and a film-language model for boosting the translation performance. We then distill the knowledge in MF-RNNs with film-language into a lightweight RNN, which is more efficient and easier to deploy. The results from objective and subjective experiments demonstrate that both MF-RNNs with film-language and lightweight RNN can generate attractive shot sequences for music, thereby enhancing the viewing and listening experience. |
Databáze: | OpenAIRE |
Externí odkaz: |