Zobrazeno 1 - 10
of 3 306
pro vyhledávání: '"R, Venkatesh"'
Autor:
Dhiman, Ankit, Shah, Manan, Parihar, Rishubh, Bhalgat, Yash, Boregowda, Lokesh R, Babu, R Venkatesh
We tackle the problem of generating highly realistic and plausible mirror reflections using diffusion-based generative models. We formulate this problem as an image inpainting task, allowing for more user control over the placement of mirrors during
Externí odkaz:
http://arxiv.org/abs/2409.14677
Recently, we have seen a surge of personalization methods for text-to-image (T2I) diffusion models to learn a concept using a few images. Existing approaches, when used for face personalization, suffer to achieve convincing inversion with identity pr
Externí odkaz:
http://arxiv.org/abs/2408.05083
For a given scene, humans can easily reason for the locations and pose to place objects. Designing a computational model to reason about these affordances poses a significant challenge, mirroring the intuitive reasoning abilities of humans. This work
Externí odkaz:
http://arxiv.org/abs/2407.15446
Autor:
Rangwani, Harsh, Agarwal, Aishwarya, Kulkarni, Kuldeep, Babu, R. Venkatesh, Karanam, Srikrishna
Text-to-image generation from large generative models like Stable Diffusion, DALLE-2, etc., have become a common base for various tasks due to their superior quality and extensive knowledge bases. As image composition and generation are creative proc
Externí odkaz:
http://arxiv.org/abs/2406.10197
The need for abundant labelled data in supervised Adversarial Training (AT) has prompted the use of Self-Supervised Learning (SSL) techniques with AT. However, the direct application of existing SSL methods to adversarial training has been sub-optima
Externí odkaz:
http://arxiv.org/abs/2406.05796
Autor:
Rangwani, Harsh, Mondal, Pradipto, Mishra, Mayank, Asokan, Ashish Ramayee, Babu, R. Venkatesh
Vision Transformer (ViT) has emerged as a prominent architecture for various computer vision tasks. In ViT, we divide the input image into patch tokens and process them through a stack of self attention blocks. However, unlike Convolutional Neural Ne
Externí odkaz:
http://arxiv.org/abs/2404.02900
Autor:
Parihar, Rishubh, Bhat, Abhijnya, Basu, Abhipsa, Mallick, Saswat, Kundu, Jogendra Nath, Babu, R. Venkatesh
Diffusion Models (DMs) have emerged as powerful generative models with unprecedented image generation capability. These models are widely used for data augmentation and creative applications. However, DMs reflect the biases present in the training da
Externí odkaz:
http://arxiv.org/abs/2402.18206
Vision-Language Models (VLMs) such as CLIP are trained on large amounts of image-text pairs, resulting in remarkable generalization across several data distributions. However, in several cases, their expensive training and data collection/curation co
Externí odkaz:
http://arxiv.org/abs/2310.08255
Neural radiance field (NeRF) based methods enable high-quality novel-view synthesis for multi-view images. This work presents a method for synthesizing colorized novel views from input grey-scale multi-view images. When we apply image or video-based
Externí odkaz:
http://arxiv.org/abs/2309.07668