Výsledky vyhledávání - "Siarohin, Aliaksandr"

Report

Pixel-Aligned Multi-View Generation with Depth Guided Decoder

Autor: Tang, Zhenggang, Zhuang, Peiye, Wang, Chaoyang, Siarohin, Aliaksandr, Kant, Yash, Schwing, Alexander, Tulyakov, Sergey, Lee, Hsin-Ying

The task of image-to-multi-view generation refers to generating novel views of an instance from a single image. Recent methods achieve this by extending text-to-image latent diffusion models to multi-view version, which contains an VAE image encoder

Externí odkaz: http://arxiv.org/abs/2408.14016

Zobrazit plný text záznamu

Report

VD3D: Taming Large Video Diffusion Transformers for 3D Camera Control

Autor: Bahmani, Sherwin, Skorokhodov, Ivan, Siarohin, Aliaksandr, Menapace, Willi, Qian, Guocheng, Vasilkovsky, Michael, Lee, Hsin-Ying, Wang, Chaoyang, Zou, Jiaxu, Tagliasacchi, Andrea, Lindell, David B., Tulyakov, Sergey

Modern text-to-video synthesis models demonstrate coherent, photorealistic generation of complex videos from a text description. However, most existing models lack fine-grained control over camera movement, which is critical for downstream applicatio

Externí odkaz: http://arxiv.org/abs/2407.12781

Zobrazit plný text záznamu

Report

VIMI: Grounding Video Generation through Multi-modal Instruction

Autor: Fang, Yuwei, Menapace, Willi, Siarohin, Aliaksandr, Chen, Tsai-Shien, Wang, Kuan-Chien, Skorokhodov, Ivan, Neubig, Graham, Tulyakov, Sergey

Existing text-to-video diffusion models rely solely on text-only encoders for their pretraining. This limitation stems from the absence of large-scale multimodal prompt video datasets, resulting in a lack of visual grounding and restricting their ver

Externí odkaz: http://arxiv.org/abs/2407.06304

Zobrazit plný text záznamu

Report

Taming Data and Transformers for Audio Generation

Autor: Haji-Ali, Moayed, Menapace, Willi, Siarohin, Aliaksandr, Balakrishnan, Guha, Tulyakov, Sergey, Ordonez, Vicente

Generating ambient sounds and effects is a challenging problem due to data scarcity and often insufficient caption quality, making it difficult to employ large-scale generative models for the task. In this work, we tackle the problem by introducing t

Externí odkaz: http://arxiv.org/abs/2406.19388

Zobrazit plný text záznamu

Report

Hierarchical Patch Diffusion Models for High-Resolution Video Generation

Autor: Skorokhodov, Ivan, Menapace, Willi, Siarohin, Aliaksandr, Tulyakov, Sergey

Diffusion models have demonstrated remarkable performance in image and video synthesis. However, scaling them to high-resolution inputs is challenging and requires restructuring the diffusion pipeline into multiple independent components, limiting sc

Externí odkaz: http://arxiv.org/abs/2406.07792

Zobrazit plný text záznamu

Report

4Real: Towards Photorealistic 4D Scene Generation via Video Diffusion Models

Autor: Yu, Heng, Wang, Chaoyang, Zhuang, Peiye, Menapace, Willi, Siarohin, Aliaksandr, Cao, Junli, Jeni, Laszlo A, Tulyakov, Sergey, Lee, Hsin-Ying

Existing dynamic scene generation methods mostly rely on distilling knowledge from pre-trained 3D generative models, which are typically fine-tuned on synthetic object datasets. As a result, the generated scenes are often object-centric and lack phot

Externí odkaz: http://arxiv.org/abs/2406.07472

Zobrazit plný text záznamu

Report

GTR: Improving Large 3D Reconstruction Models through Geometry and Texture Refinement

Autor: Zhuang, Peiye, Han, Songfang, Wang, Chaoyang, Siarohin, Aliaksandr, Zou, Jiaxu, Vasilkovsky, Michael, Shakhrai, Vladislav, Korolev, Sergey, Tulyakov, Sergey, Lee, Hsin-Ying

We propose a novel approach for 3D mesh reconstruction from multi-view images. Our method takes inspiration from large reconstruction models like LRM that use a transformer-based triplane generator and a Neural Radiance Field (NeRF) model trained on

Externí odkaz: http://arxiv.org/abs/2406.05649

Zobrazit plný text záznamu

Report

SF-V: Single Forward Video Generation Model

Autor: Zhang, Zhixing, Li, Yanyu, Wu, Yushu, Xu, Yanwu, Kag, Anil, Skorokhodov, Ivan, Menapace, Willi, Siarohin, Aliaksandr, Cao, Junli, Metaxas, Dimitris, Tulyakov, Sergey, Ren, Jian

Diffusion-based video generation models have demonstrated remarkable success in obtaining high-fidelity videos through the iterative denoising process. However, these models require multiple denoising steps during sampling, resulting in high computat

Externí odkaz: http://arxiv.org/abs/2406.04324

Zobrazit plný text záznamu

Report

Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers

Autor: Chen, Tsai-Shien, Siarohin, Aliaksandr, Menapace, Willi, Deyneka, Ekaterina, Chao, Hsiang-wei, Jeon, Byung Eun, Fang, Yuwei, Lee, Hsin-Ying, Ren, Jian, Yang, Ming-Hsuan, Tulyakov, Sergey

The quality of the data and annotation upper-bounds the quality of a downstream model. While there exist large text corpora and image-text pairs, high-quality video-text data is much harder to collect. First of all, manual labeling is more time-consu

Externí odkaz: http://arxiv.org/abs/2402.19479

Zobrazit plný text záznamu

Report

Snap Video: Scaled Spatiotemporal Transformers for Text-to-Video Synthesis

Autor: Menapace, Willi, Siarohin, Aliaksandr, Skorokhodov, Ivan, Deyneka, Ekaterina, Chen, Tsai-Shien, Kag, Anil, Fang, Yuwei, Stoliar, Aleksei, Ricci, Elisa, Ren, Jian, Tulyakov, Sergey

Contemporary models for generating images show remarkable quality and versatility. Swayed by these advantages, the research community repurposes them to generate videos. Since video content is highly redundant, we argue that naively bringing advances

Externí odkaz: http://arxiv.org/abs/2402.14797

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání