Zobrazeno 1 - 10
of 82
pro vyhledávání: '"Ommer, Bjoern"'
Accurately describing images with text is a foundation of explainable AI. Vision-Language Models (VLMs) like CLIP have recently addressed this by aligning images and texts in a shared embedding space, expressing semantic similarities between vision a
Externí odkaz:
http://arxiv.org/abs/2412.11917
Autor:
Hu, Vincent Tao, Ommer, Björn
In generative models, two paradigms have gained attraction in various applications: next-set prediction-based Masked Generative Models and next-noise prediction-based Non-Autoregressive Models, e.g., Diffusion Models. In this work, we propose using d
Externí odkaz:
http://arxiv.org/abs/2412.06787
Semantic correspondence, the task of determining relationships between different parts of images, underpins various applications including 3D reconstruction, image-to-image translation, object tracking, and visual place recognition. Recent studies ha
Externí odkaz:
http://arxiv.org/abs/2412.03512
Internal features from large-scale pre-trained diffusion models have recently been established as powerful semantic descriptors for a wide range of downstream tasks. Works that use these features generally need to add noise to images before passing t
Externí odkaz:
http://arxiv.org/abs/2412.03439
Autor:
Wang, Jiangtao, Qin, Zhen, Zhang, Yifan, Hu, Vincent Tao, Ommer, Björn, Briq, Rania, Kesselheim, Stefan
Vision tokenizers have gained a lot of attraction due to their scalability and compactness; previous works depend on old-school GAN-based hyperparameters, biased comparisons, and a lack of comprehensive analysis of the scaling behaviours. To tackle t
Externí odkaz:
http://arxiv.org/abs/2412.02632
Autor:
Kotovenko, Dmytro, Grebenkova, Olga, Sarafianos, Nikolaos, Paliwal, Avinash, Ma, Pingchuan, Poursaeed, Omid, Mohan, Sreyas, Fan, Yuchen, Li, Yilei, Ranjan, Rakesh, Ommer, Björn
While style transfer techniques have been well-developed for 2D image stylization, the extension of these methods to 3D scenes remains relatively unexplored. Existing approaches demonstrate proficiency in transferring colors and textures but often st
Externí odkaz:
http://arxiv.org/abs/2409.17917
Controllable text-to-image (T2I) diffusion models have shown impressive performance in generating high-quality visual content through the incorporation of various conditions. Current methods, however, exhibit limited performance when guided by skelet
Externí odkaz:
http://arxiv.org/abs/2406.02485
Autor:
Stracke, Nick, Baumann, Stefan Andreas, Susskind, Joshua M., Bautista, Miguel Angel, Ommer, Björn
Text-to-image generative models have become a prominent and powerful tool that excels at generating high-resolution realistic images. However, guiding the generative process of these models to consider detailed forms of conditioning reflecting style
Externí odkaz:
http://arxiv.org/abs/2405.07913
Autor:
Baumann, Stefan Andreas, Krause, Felix, Neumayr, Michael, Stracke, Nick, Hu, Vincent Tao, Ommer, Björn
In recent years, advances in text-to-image (T2I) diffusion models have substantially elevated the quality of their generated images. However, achieving fine-grained control over attributes remains a challenge due to the limitations of natural languag
Externí odkaz:
http://arxiv.org/abs/2403.17064
In this work we propose a novel method for unsupervised controllable video generation. Once trained on a dataset of unannotated videos, at inference our model is capable of both composing scenes of predefined object parts and animating them in a plau
Externí odkaz:
http://arxiv.org/abs/2403.14368