Zobrazeno 1 - 10
of 41
pro vyhledávání: '"Menapace, Willi"'
Autor:
Kag, Anil, Coskun, Huseyin, Chen, Jierun, Cao, Junli, Menapace, Willi, Siarohin, Aliaksandr, Tulyakov, Sergey, Ren, Jian
Neural network architecture design requires making many crucial decisions. The common desiderata is that similar decisions, with little modifications, can be reused in a variety of tasks and applications. To satisfy that, architectures must provide p
Externí odkaz:
http://arxiv.org/abs/2411.04967
Autor:
Bahmani, Sherwin, Skorokhodov, Ivan, Siarohin, Aliaksandr, Menapace, Willi, Qian, Guocheng, Vasilkovsky, Michael, Lee, Hsin-Ying, Wang, Chaoyang, Zou, Jiaxu, Tagliasacchi, Andrea, Lindell, David B., Tulyakov, Sergey
Modern text-to-video synthesis models demonstrate coherent, photorealistic generation of complex videos from a text description. However, most existing models lack fine-grained control over camera movement, which is critical for downstream applicatio
Externí odkaz:
http://arxiv.org/abs/2407.12781
Autor:
Fang, Yuwei, Menapace, Willi, Siarohin, Aliaksandr, Chen, Tsai-Shien, Wang, Kuan-Chien, Skorokhodov, Ivan, Neubig, Graham, Tulyakov, Sergey
Existing text-to-video diffusion models rely solely on text-only encoders for their pretraining. This limitation stems from the absence of large-scale multimodal prompt video datasets, resulting in a lack of visual grounding and restricting their ver
Externí odkaz:
http://arxiv.org/abs/2407.06304
Autor:
Haji-Ali, Moayed, Menapace, Willi, Siarohin, Aliaksandr, Balakrishnan, Guha, Tulyakov, Sergey, Ordonez, Vicente
Generating ambient sounds is a challenging task due to data scarcity and often insufficient caption quality, making it difficult to employ large-scale generative models for the task. In this work, we tackle this problem by introducing two new models.
Externí odkaz:
http://arxiv.org/abs/2406.19388
Diffusion models have demonstrated remarkable performance in image and video synthesis. However, scaling them to high-resolution inputs is challenging and requires restructuring the diffusion pipeline into multiple independent components, limiting sc
Externí odkaz:
http://arxiv.org/abs/2406.07792
Autor:
Yu, Heng, Wang, Chaoyang, Zhuang, Peiye, Menapace, Willi, Siarohin, Aliaksandr, Cao, Junli, Jeni, Laszlo A, Tulyakov, Sergey, Lee, Hsin-Ying
Existing dynamic scene generation methods mostly rely on distilling knowledge from pre-trained 3D generative models, which are typically fine-tuned on synthetic object datasets. As a result, the generated scenes are often object-centric and lack phot
Externí odkaz:
http://arxiv.org/abs/2406.07472
Autor:
Zhang, Zhixing, Li, Yanyu, Wu, Yushu, Xu, Yanwu, Kag, Anil, Skorokhodov, Ivan, Menapace, Willi, Siarohin, Aliaksandr, Cao, Junli, Metaxas, Dimitris, Tulyakov, Sergey, Ren, Jian
Diffusion-based video generation models have demonstrated remarkable success in obtaining high-fidelity videos through the iterative denoising process. However, these models require multiple denoising steps during sampling, resulting in high computat
Externí odkaz:
http://arxiv.org/abs/2406.04324
Autor:
Menapace, Willi
Multimedia content plays a pivotal role in contemporary society, permeating diverse facets of communication and entertainment. However, the creation of high-quality multimedia content often requires specialized expertise and equipment, limiting its w
Externí odkaz:
https://hdl.handle.net/11572/400170
Video anomaly detection (VAD) aims to temporally locate abnormal events in a video. Existing works mostly rely on training deep models to learn the distribution of normality with either video-level supervision, one-class supervision, or in an unsuper
Externí odkaz:
http://arxiv.org/abs/2404.01014
Autor:
Chen, Tsai-Shien, Siarohin, Aliaksandr, Menapace, Willi, Deyneka, Ekaterina, Chao, Hsiang-wei, Jeon, Byung Eun, Fang, Yuwei, Lee, Hsin-Ying, Ren, Jian, Yang, Ming-Hsuan, Tulyakov, Sergey
The quality of the data and annotation upper-bounds the quality of a downstream model. While there exist large text corpora and image-text pairs, high-quality video-text data is much harder to collect. First of all, manual labeling is more time-consu
Externí odkaz:
http://arxiv.org/abs/2402.19479