Listen and Move: Improving GANs Coherency in Agnostic Sound-to-Video Generation

Autor:	Redondo, Rafael
Rok vydání:	2024
Předmět:	Computer Science - Sound Computer Science - Graphics Electrical Engineering and Systems Science - Audio and Speech Processing
Zdroj:	Abstract version published in the ICCV 2023 workshop "AV4D: Visual Learning of Sounds in Spaces"
Druh dokumentu:	Working Paper
Popis:	Deep generative models have demonstrated the ability to create realistic audiovisual content, sometimes driven by domains of different nature. However, smooth temporal dynamics in video generation is a challenging problem. This work focuses on generic sound-to-video generation and proposes three main features to enhance both image quality and temporal coherency in generative adversarial models: a triple sound routing scheme, a multi-scale residual and dilated recurrent network for extended sound analysis, and a novel recurrent and directional convolutional layer for video prediction. Each of the proposed features improves, in both quality and coherency, the baseline neural architecture typically used in the SoTA, with the video prediction layer providing an extra temporal refinement. Comment: Full paper of the homonym paper published in the ICCV 2023 workshop "AV4D: Visual Learning of Sounds in Spaces"
Databáze:	arXiv
Externí odkaz:	http://arxiv.org/abs/2406.16155 Zobrazit plný text záznamu View this record from Arxiv