Výsledky vyhledávání

Report

LaGeM: A Large Geometry Model for 3D Representation Learning and Diffusion

Autor: Zhang, Biao, Wonka, Peter

This paper introduces a novel hierarchical autoencoder that maps 3D models into a highly compressed latent space. The hierarchical autoencoder is specifically designed to tackle the challenges arising from large-scale datasets and generative modeling

Externí odkaz: http://arxiv.org/abs/2410.01295

Zobrazit plný text záznamu

Report

ImmersePro: End-to-End Stereo Video Synthesis Via Implicit Disparity Learning

Autor: Shi, Jian, Li, Zhenyu, Wonka, Peter

We introduce \textit{ImmersePro}, an innovative framework specifically designed to transform single-view videos into stereo videos. This framework utilizes a novel dual-branch architecture comprising a disparity branch and a context branch on video d

Externí odkaz: http://arxiv.org/abs/2410.00262

Zobrazit plný text záznamu

Report

Build-A-Scene: Interactive 3D Layout Control for Diffusion-Based Image Generation

Autor: Eldesokey, Abdelrahman, Wonka, Peter

We propose a diffusion-based approach for Text-to-Image (T2I) generation with interactive 3D layout control. Layout control has been widely studied to alleviate the shortcomings of T2I diffusion models in understanding objects' placement and relation

Externí odkaz: http://arxiv.org/abs/2408.14819

Zobrazit plný text záznamu

Report

A3D: Does Diffusion Dream about 3D Alignment?

Autor: Ignatyev, Savva, Konovalova, Nina, Selikhanovych, Daniil, Patakin, Nikolay, Voynov, Oleg, Senushkin, Dmitry, Filippov, Alexander, Konushin, Anton, Wonka, Peter, Burnaev, Evgeny

We tackle the problem of text-driven 3D generation from a geometry alignment perspective. We aim at the generation of multiple objects which are consistent in terms of semantics and geometry. Recent methods based on Score Distillation have succeeded

Externí odkaz: http://arxiv.org/abs/2406.15020

Zobrazit plný text záznamu

Report

VIA: A Spatiotemporal Video Adaptation Framework for Global and Local Video Editing

Autor: Gu, Jing, Fang, Yuwei, Skorokhodov, Ivan, Wonka, Peter, Du, Xinya, Tulyakov, Sergey, Wang, Xin Eric

Video editing stands as a cornerstone of digital media, from entertainment and education to professional communication. However, previous methods often overlook the necessity of comprehensively understanding both global and local contexts, leading to

Externí odkaz: http://arxiv.org/abs/2406.12831

Zobrazit plný text záznamu

Report

Vivid-ZOO: Multi-View Video Generation with Diffusion Model

Autor: Li, Bing, Zheng, Cheng, Zhu, Wenxuan, Mai, Jinjie, Zhang, Biao, Wonka, Peter, Ghanem, Bernard

While diffusion models have shown impressive performance in 2D image/video generation, diffusion-based Text-to-Multi-view-Video (T2MVid) generation remains underexplored. The new challenges posed by T2MVid generation lie in the lack of massive captio

Externí odkaz: http://arxiv.org/abs/2406.08659

Zobrazit plný text záznamu

Report

PatchRefiner: Leveraging Synthetic Data for Real-Domain High-Resolution Monocular Metric Depth Estimation

Autor: Li, Zhenyu, Bhat, Shariq Farooq, Wonka, Peter

This paper introduces PatchRefiner, an advanced framework for metric single image depth estimation aimed at high-resolution real-domain inputs. While depth estimation is crucial for applications such as autonomous driving, 3D generative modeling, and

Externí odkaz: http://arxiv.org/abs/2406.06679

Zobrazit plný text záznamu

Report

E$^3$-Net: Efficient E(3)-Equivariant Normal Estimation Network

Autor: Wang, Hanxiao, Zhao, Mingyang, Quan, Weize, Chen, Zhen, Yan, Dong-ming, Wonka, Peter

Point cloud normal estimation is a fundamental task in 3D geometry processing. While recent learning-based methods achieve notable advancements in normal prediction, they often overlook the critical aspect of equivariance. This results in inefficient

Externí odkaz: http://arxiv.org/abs/2406.00347

Zobrazit plný text záznamu

Report

Zero-Shot Video Semantic Segmentation based on Pre-Trained Diffusion Models

Autor: Wang, Qian, Eldesokey, Abdelrahman, Mendiratta, Mohit, Zhan, Fangneng, Kortylewski, Adam, Theobalt, Christian, Wonka, Peter

We introduce the first zero-shot approach for Video Semantic Segmentation (VSS) based on pre-trained diffusion models. A growing research direction attempts to employ diffusion models to perform downstream vision tasks by exploiting their deep unders

Externí odkaz: http://arxiv.org/abs/2405.16947

Zobrazit plný text záznamu

Report

PS-CAD: Local Geometry Guidance via Prompting and Selection for CAD Reconstruction

Autor: Yang, Bingchen, Jiang, Haiyong, Pan, Hao, Wonka, Peter, Xiao, Jun, Lin, Guosheng

Reverse engineering CAD models from raw geometry is a classic but challenging research problem. In particular, reconstructing the CAD modeling sequence from point clouds provides great interpretability and convenience for editing. To improve upon thi

Externí odkaz: http://arxiv.org/abs/2405.15188

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání