Zobrazeno 1 - 10
of 42
pro vyhledávání: '"Menapace, Willi"'
Autor:
Kag, Anil, Coskun, Huseyin, Chen, Jierun, Cao, Junli, Menapace, Willi, Siarohin, Aliaksandr, Tulyakov, Sergey, Ren, Jian
Neural network architecture design requires making many crucial decisions. The common desiderata is that similar decisions, with little modifications, can be reused in a variety of tasks and applications. To satisfy that, architectures must provide p
Externí odkaz:
http://arxiv.org/abs/2411.04967
Autor:
Bahmani, Sherwin, Skorokhodov, Ivan, Siarohin, Aliaksandr, Menapace, Willi, Qian, Guocheng, Vasilkovsky, Michael, Lee, Hsin-Ying, Wang, Chaoyang, Zou, Jiaxu, Tagliasacchi, Andrea, Lindell, David B., Tulyakov, Sergey
Modern text-to-video synthesis models demonstrate coherent, photorealistic generation of complex videos from a text description. However, most existing models lack fine-grained control over camera movement, which is critical for downstream applicatio
Externí odkaz:
http://arxiv.org/abs/2407.12781
Autor:
Fang, Yuwei, Menapace, Willi, Siarohin, Aliaksandr, Chen, Tsai-Shien, Wang, Kuan-Chien, Skorokhodov, Ivan, Neubig, Graham, Tulyakov, Sergey
Existing text-to-video diffusion models rely solely on text-only encoders for their pretraining. This limitation stems from the absence of large-scale multimodal prompt video datasets, resulting in a lack of visual grounding and restricting their ver
Externí odkaz:
http://arxiv.org/abs/2407.06304
Autor:
Haji-Ali, Moayed, Menapace, Willi, Siarohin, Aliaksandr, Balakrishnan, Guha, Tulyakov, Sergey, Ordonez, Vicente
Generating ambient sounds is a challenging task due to data scarcity and often insufficient caption quality, making it difficult to employ large-scale generative models for the task. In this work, we tackle this problem by introducing two new models.
Externí odkaz:
http://arxiv.org/abs/2406.19388
Diffusion models have demonstrated remarkable performance in image and video synthesis. However, scaling them to high-resolution inputs is challenging and requires restructuring the diffusion pipeline into multiple independent components, limiting sc
Externí odkaz:
http://arxiv.org/abs/2406.07792
Autor:
Yu, Heng, Wang, Chaoyang, Zhuang, Peiye, Menapace, Willi, Siarohin, Aliaksandr, Cao, Junli, Jeni, Laszlo A, Tulyakov, Sergey, Lee, Hsin-Ying
Existing dynamic scene generation methods mostly rely on distilling knowledge from pre-trained 3D generative models, which are typically fine-tuned on synthetic object datasets. As a result, the generated scenes are often object-centric and lack phot
Externí odkaz:
http://arxiv.org/abs/2406.07472
Autor:
Zhang, Zhixing, Li, Yanyu, Wu, Yushu, Xu, Yanwu, Kag, Anil, Skorokhodov, Ivan, Menapace, Willi, Siarohin, Aliaksandr, Cao, Junli, Metaxas, Dimitris, Tulyakov, Sergey, Ren, Jian
Diffusion-based video generation models have demonstrated remarkable success in obtaining high-fidelity videos through the iterative denoising process. However, these models require multiple denoising steps during sampling, resulting in high computat
Externí odkaz:
http://arxiv.org/abs/2406.04324
Video anomaly detection (VAD) aims to temporally locate abnormal events in a video. Existing works mostly rely on training deep models to learn the distribution of normality with either video-level supervision, one-class supervision, or in an unsuper
Externí odkaz:
http://arxiv.org/abs/2404.01014
Autor:
Chen, Tsai-Shien, Siarohin, Aliaksandr, Menapace, Willi, Deyneka, Ekaterina, Chao, Hsiang-wei, Jeon, Byung Eun, Fang, Yuwei, Lee, Hsin-Ying, Ren, Jian, Yang, Ming-Hsuan, Tulyakov, Sergey
The quality of the data and annotation upper-bounds the quality of a downstream model. While there exist large text corpora and image-text pairs, high-quality video-text data is much harder to collect. First of all, manual labeling is more time-consu
Externí odkaz:
http://arxiv.org/abs/2402.19479
Autor:
Menapace, Willi, Siarohin, Aliaksandr, Skorokhodov, Ivan, Deyneka, Ekaterina, Chen, Tsai-Shien, Kag, Anil, Fang, Yuwei, Stoliar, Aleksei, Ricci, Elisa, Ren, Jian, Tulyakov, Sergey
Contemporary models for generating images show remarkable quality and versatility. Swayed by these advantages, the research community repurposes them to generate videos. Since video content is highly redundant, we argue that naively bringing advances
Externí odkaz:
http://arxiv.org/abs/2402.14797