Zobrazeno 1 - 10
of 210
pro vyhledávání: '"Hayat Munawar"'
Diffusion models excel at generative modeling (e.g., text-to-image) but sampling requires multiple denoising network passes, limiting practicality. Efforts such as progressive distillation or consistency distillation have shown promise by reducing th
Externí odkaz:
http://arxiv.org/abs/2410.11971
Autor:
Cai, Zhixi, Dhall, Abhinav, Ghosh, Shreya, Hayat, Munawar, Kollias, Dimitrios, Stefanov, Kalin, Tariq, Usman
The detection and localization of deepfake content, particularly when small fake segments are seamlessly mixed with real videos, remains a significant challenge in the field of digital media security. Based on the recently released AV-Deepfake1M data
Externí odkaz:
http://arxiv.org/abs/2409.06991
Autor:
Borse, Shubhankar, Kadambi, Shreya, Pandey, Nilesh Prasad, Bhardwaj, Kartikeya, Ganapathy, Viswanath, Priyadarshi, Sweta, Garrepalli, Risheek, Esteves, Rafael, Hayat, Munawar, Porikli, Fatih
While Low-Rank Adaptation (LoRA) has proven beneficial for efficiently fine-tuning large models, LoRA fine-tuned text-to-image diffusion models lack diversity in the generated images, as the model tends to copy data from the observed training samples
Externí odkaz:
http://arxiv.org/abs/2406.08798
Few-shot image synthesis entails generating diverse and realistic images of novel categories using only a few example images. While multiple recent efforts in this direction have achieved impressive results, the existing approaches are dependent only
Externí odkaz:
http://arxiv.org/abs/2404.16556
Autor:
Jeong, Jisoo, Cai, Hong, Garrepalli, Risheek, Lin, Jamie Menjay, Hayat, Munawar, Porikli, Fatih
The scarcity of ground-truth labels poses one major challenge in developing optical flow estimation models that are both generalizable and robust. While current methods rely on data augmentation, they have yet to fully exploit the rich information av
Externí odkaz:
http://arxiv.org/abs/2403.18092
Autor:
VS, Vibashan, Borse, Shubhankar, Park, Hyojin, Das, Debasmit, Patel, Vishal, Hayat, Munawar, Porikli, Fatih
In this paper, we introduce an open-vocabulary panoptic segmentation model that effectively unifies the strengths of the Segment Anything Model (SAM) with the vision-language CLIP model in an end-to-end framework. While SAM excels in generating spati
Externí odkaz:
http://arxiv.org/abs/2403.09620
Diffusion models are highly effective at generating high-quality images but pose risks, such as the unintentional generation of NSFW (not safe for work) content. Although various techniques have been proposed to mitigate unwanted influences in diffus
Externí odkaz:
http://arxiv.org/abs/2401.05779
The task of Visual Relationship Recognition (VRR) aims to identify relationships between two interacting objects in an image and is particularly challenging due to the widely-spread and highly imbalanced distribution of tr
Externí odkaz:
http://arxiv.org/abs/2401.01387
Autor:
Cai, Zhixi, Ghosh, Shreya, Adatia, Aman Pankaj, Hayat, Munawar, Dhall, Abhinav, Gedeon, Tom, Stefanov, Kalin
The detection and localization of highly realistic deepfake audio-visual content are challenging even for the most advanced state-of-the-art methods. While most of the research efforts in this domain are focused on detecting high-quality deepfake ima
Externí odkaz:
http://arxiv.org/abs/2311.15308
In recent years, open-vocabulary (OV) dense visual prediction (such as OV object detection, semantic, instance and panoptic segmentations) has attracted increasing research attention. However, most of existing approaches are task-specific and individ
Externí odkaz:
http://arxiv.org/abs/2307.08238