Zobrazeno 1 - 10
of 77
pro vyhledávání: '"Skorokhodov, Ivan"'
Autor:
Wu, Ziyi, Siarohin, Aliaksandr, Menapace, Willi, Skorokhodov, Ivan, Fang, Yuwei, Chordia, Varnith, Gilitschenski, Igor, Tulyakov, Sergey
Real-world videos consist of sequences of events. Generating such sequences with precise temporal control is infeasible with existing video generators that rely on a single paragraph of text as input. When tasked with generating multiple events descr
Externí odkaz:
http://arxiv.org/abs/2412.05263
Autor:
Wang, Chaoyang, Zhuang, Peiye, Ngo, Tuan Duc, Menapace, Willi, Siarohin, Aliaksandr, Vasilkovsky, Michael, Skorokhodov, Ivan, Tulyakov, Sergey, Wonka, Peter, Lee, Hsin-Ying
We propose 4Real-Video, a novel framework for generating 4D videos, organized as a grid of video frames with both time and viewpoint axes. In this grid, each row contains frames sharing the same timestep, while each column contains frames from the sa
Externí odkaz:
http://arxiv.org/abs/2412.04462
Autor:
Bahmani, Sherwin, Skorokhodov, Ivan, Qian, Guocheng, Siarohin, Aliaksandr, Menapace, Willi, Tagliasacchi, Andrea, Lindell, David B., Tulyakov, Sergey
Numerous works have recently integrated 3D camera control into foundational text-to-video models, but the resulting camera control is often imprecise, and video generation quality suffers. In this work, we analyze camera motion from a first principle
Externí odkaz:
http://arxiv.org/abs/2411.18673
Autor:
Bahmani, Sherwin, Skorokhodov, Ivan, Siarohin, Aliaksandr, Menapace, Willi, Qian, Guocheng, Vasilkovsky, Michael, Lee, Hsin-Ying, Wang, Chaoyang, Zou, Jiaxu, Tagliasacchi, Andrea, Lindell, David B., Tulyakov, Sergey
Modern text-to-video synthesis models demonstrate coherent, photorealistic generation of complex videos from a text description. However, most existing models lack fine-grained control over camera movement, which is critical for downstream applicatio
Externí odkaz:
http://arxiv.org/abs/2407.12781
Autor:
Fang, Yuwei, Menapace, Willi, Siarohin, Aliaksandr, Chen, Tsai-Shien, Wang, Kuan-Chien, Skorokhodov, Ivan, Neubig, Graham, Tulyakov, Sergey
Existing text-to-video diffusion models rely solely on text-only encoders for their pretraining. This limitation stems from the absence of large-scale multimodal prompt video datasets, resulting in a lack of visual grounding and restricting their ver
Externí odkaz:
http://arxiv.org/abs/2407.06304
Autor:
Gu, Jing, Fang, Yuwei, Skorokhodov, Ivan, Wonka, Peter, Du, Xinya, Tulyakov, Sergey, Wang, Xin Eric
Video editing is a cornerstone of digital media, from entertainment and education to professional communication. However, previous methods often overlook the necessity of comprehensively understanding both global and local contexts, leading to inaccu
Externí odkaz:
http://arxiv.org/abs/2406.12831
Diffusion models have demonstrated remarkable performance in image and video synthesis. However, scaling them to high-resolution inputs is challenging and requires restructuring the diffusion pipeline into multiple independent components, limiting sc
Externí odkaz:
http://arxiv.org/abs/2406.07792
Autor:
Zhang, Zhixing, Li, Yanyu, Wu, Yushu, Xu, Yanwu, Kag, Anil, Skorokhodov, Ivan, Menapace, Willi, Siarohin, Aliaksandr, Cao, Junli, Metaxas, Dimitris, Tulyakov, Sergey, Ren, Jian
Diffusion-based video generation models have demonstrated remarkable success in obtaining high-fidelity videos through the iterative denoising process. However, these models require multiple denoising steps during sampling, resulting in high computat
Externí odkaz:
http://arxiv.org/abs/2406.04324
Autor:
Bahmani, Sherwin, Liu, Xian, Yifan, Wang, Skorokhodov, Ivan, Rong, Victor, Liu, Ziwei, Liu, Xihui, Park, Jeong Joon, Tulyakov, Sergey, Wetzstein, Gordon, Tagliasacchi, Andrea, Lindell, David B.
Recent techniques for text-to-4D generation synthesize dynamic 3D scenes using supervision from pre-trained text-to-video models. However, existing representations for motion, such as deformation models or time-dependent neural representations, are l
Externí odkaz:
http://arxiv.org/abs/2403.17920
Autor:
Menapace, Willi, Siarohin, Aliaksandr, Skorokhodov, Ivan, Deyneka, Ekaterina, Chen, Tsai-Shien, Kag, Anil, Fang, Yuwei, Stoliar, Aleksei, Ricci, Elisa, Ren, Jian, Tulyakov, Sergey
Contemporary models for generating images show remarkable quality and versatility. Swayed by these advantages, the research community repurposes them to generate videos. Since video content is highly redundant, we argue that naively bringing advances
Externí odkaz:
http://arxiv.org/abs/2402.14797