Zobrazeno 1 - 10
of 612
pro vyhledávání: '"Wonka, P."'
Autor:
Zhang, Biao, Wonka, Peter
This paper introduces a novel hierarchical autoencoder that maps 3D models into a highly compressed latent space. The hierarchical autoencoder is specifically designed to tackle the challenges arising from large-scale datasets and generative modeling
Externí odkaz:
http://arxiv.org/abs/2410.01295
We introduce \textit{ImmersePro}, an innovative framework specifically designed to transform single-view videos into stereo videos. This framework utilizes a novel dual-branch architecture comprising a disparity branch and a context branch on video d
Externí odkaz:
http://arxiv.org/abs/2410.00262
Autor:
Eldesokey, Abdelrahman, Wonka, Peter
We propose a diffusion-based approach for Text-to-Image (T2I) generation with interactive 3D layout control. Layout control has been widely studied to alleviate the shortcomings of T2I diffusion models in understanding objects' placement and relation
Externí odkaz:
http://arxiv.org/abs/2408.14819
Autor:
Ignatyev, Savva, Konovalova, Nina, Selikhanovych, Daniil, Patakin, Nikolay, Voynov, Oleg, Senushkin, Dmitry, Filippov, Alexander, Konushin, Anton, Wonka, Peter, Burnaev, Evgeny
We tackle the problem of text-driven 3D generation from a geometry alignment perspective. We aim at the generation of multiple objects which are consistent in terms of semantics and geometry. Recent methods based on Score Distillation have succeeded
Externí odkaz:
http://arxiv.org/abs/2406.15020
Autor:
Gu, Jing, Fang, Yuwei, Skorokhodov, Ivan, Wonka, Peter, Du, Xinya, Tulyakov, Sergey, Wang, Xin Eric
Video editing stands as a cornerstone of digital media, from entertainment and education to professional communication. However, previous methods often overlook the necessity of comprehensively understanding both global and local contexts, leading to
Externí odkaz:
http://arxiv.org/abs/2406.12831
Autor:
Li, Bing, Zheng, Cheng, Zhu, Wenxuan, Mai, Jinjie, Zhang, Biao, Wonka, Peter, Ghanem, Bernard
While diffusion models have shown impressive performance in 2D image/video generation, diffusion-based Text-to-Multi-view-Video (T2MVid) generation remains underexplored. The new challenges posed by T2MVid generation lie in the lack of massive captio
Externí odkaz:
http://arxiv.org/abs/2406.08659
This paper introduces PatchRefiner, an advanced framework for metric single image depth estimation aimed at high-resolution real-domain inputs. While depth estimation is crucial for applications such as autonomous driving, 3D generative modeling, and
Externí odkaz:
http://arxiv.org/abs/2406.06679
Point cloud normal estimation is a fundamental task in 3D geometry processing. While recent learning-based methods achieve notable advancements in normal prediction, they often overlook the critical aspect of equivariance. This results in inefficient
Externí odkaz:
http://arxiv.org/abs/2406.00347
Autor:
Wang, Qian, Eldesokey, Abdelrahman, Mendiratta, Mohit, Zhan, Fangneng, Kortylewski, Adam, Theobalt, Christian, Wonka, Peter
We introduce the first zero-shot approach for Video Semantic Segmentation (VSS) based on pre-trained diffusion models. A growing research direction attempts to employ diffusion models to perform downstream vision tasks by exploiting their deep unders
Externí odkaz:
http://arxiv.org/abs/2405.16947
Reverse engineering CAD models from raw geometry is a classic but challenging research problem. In particular, reconstructing the CAD modeling sequence from point clouds provides great interpretability and convenience for editing. To improve upon thi
Externí odkaz:
http://arxiv.org/abs/2405.15188