Zobrazeno 1 - 10
of 69
pro vyhledávání: '"Tonioni, Alessio"'
Autor:
Kar, Oğuzhan Fatih, Tonioni, Alessio, Poklukar, Petra, Kulshrestha, Achin, Zamir, Amir, Tombari, Federico
Vision-language models (VLMs) are typically composed of a vision encoder, e.g. CLIP, and a language model (LM) that interprets the encoded features to solve downstream tasks. Despite remarkable progress, VLMs are subject to several shortcomings due t
Externí odkaz:
http://arxiv.org/abs/2404.07204
Autor:
Comi, Mauro, Tonioni, Alessio, Yang, Max, Tremblay, Jonathan, Blukis, Valts, Lin, Yijiong, Lepora, Nathan F., Aitchison, Laurence
Touch and vision go hand in hand, mutually enhancing our ability to understand the world. From a research perspective, the problem of mixing touch and vision is underexplored and presents interesting challenges. To this end, we propose Tactile-Inform
Externí odkaz:
http://arxiv.org/abs/2403.20275
Autor:
Shahbazi, Mohamad, Claessens, Liesbeth, Niemeyer, Michael, Collins, Edo, Tonioni, Alessio, Van Gool, Luc, Tombari, Federico
We introduce InseRF, a novel method for generative object insertion in the NeRF reconstructions of 3D scenes. Based on a user-provided textual description and a 2D bounding box in a reference viewpoint, InseRF generates new objects in 3D scenes. Rece
Externí odkaz:
http://arxiv.org/abs/2401.05335
In this paper we present a text-conditioned video resampler (TCR) module that uses a pre-trained and frozen visual encoder and large language model (LLM) to process long video sequences for a task. TCR localises relevant visual features from the vide
Externí odkaz:
http://arxiv.org/abs/2312.11897
Diffusion models (DMs) have gained prominence due to their ability to generate high-quality, varied images, with recent advancements in text-to-image generation. The research focus is now shifting towards the controllability of DMs. A significant cha
Externí odkaz:
http://arxiv.org/abs/2312.09256
Autor:
Comi, Mauro, Lin, Yijiong, Church, Alex, Tonioni, Alessio, Aitchison, Laurence, Lepora, Nathan F.
Humans rely on their visual and tactile senses to develop a comprehensive 3D understanding of their physical environment. Recently, there has been a growing interest in exploring and manipulating objects using data-driven approaches that utilise high
Externí odkaz:
http://arxiv.org/abs/2311.12602
Autor:
Tsalicoglou, Christina, Manhardt, Fabian, Tonioni, Alessio, Niemeyer, Michael, Tombari, Federico
The ability to generate highly realistic 2D images from mere text prompts has recently made huge progress in terms of speed and quality, thanks to the advent of image diffusion models. Naturally, the question arises if this can be also achieved in th
Externí odkaz:
http://arxiv.org/abs/2304.12439
We introduce a novel framework for training deep stereo networks effortlessly and without any ground-truth. By leveraging state-of-the-art neural rendering solutions, we generate stereo training data from image sequences collected with a single handh
Externí odkaz:
http://arxiv.org/abs/2303.17603
Autor:
Shahbazi, Mohamad, Ntavelis, Evangelos, Tonioni, Alessio, Collins, Edo, Paudel, Danda Pani, Danelljan, Martin, Van Gool, Luc
Pose-conditioned convolutional generative models struggle with high-quality 3D-consistent image generation from single-view datasets, due to their lack of sufficient 3D priors. Recently, the integration of Neural Radiance Fields (NeRFs) and generativ
Externí odkaz:
http://arxiv.org/abs/2303.12865
Autor:
Ramirez, Pierluigi Zama, Cardace, Adriano, De Luigi, Luca, Tonioni, Alessio, Salti, Samuele, Di Stefano, Luigi
Availability of labelled data is the major obstacle to the deployment of deep learning algorithms for computer vision tasks in new domains. The fact that many frameworks adopted to solve different tasks share the same architecture suggests that there
Externí odkaz:
http://arxiv.org/abs/2301.11310