Zobrazeno 1 - 10
of 18
pro vyhledávání: '"Khattak, Muhammad Uzair"'
This study explores the concept of cross-disease transferability (XDT) in medical imaging, focusing on the potential of binary classifiers trained on one disease to perform zero-shot classification on another disease affecting the same organ. Utilizi
Externí odkaz:
http://arxiv.org/abs/2408.11493
Autor:
Khattak, Muhammad Uzair, Naeem, Muhammad Ferjad, Hassan, Jameel, Naseer, Muzammal, Tombari, Federico, Khan, Fahad Shahbaz, Khan, Salman
Recent advancements in Large Language Models (LLMs) have led to the development of Video Large Multi-modal Models (Video-LMMs) that can handle a wide range of video understanding tasks. These models have the potential to be deployed in real-world app
Externí odkaz:
http://arxiv.org/abs/2405.03690
Autor:
Khattak, Muhammad Uzair, Naeem, Muhammad Ferjad, Naseer, Muzammal, Van Gool, Luc, Tombari, Federico
Foundational vision-language models such as CLIP are becoming a new paradigm in vision, due to their excellent generalization abilities. However, adapting these models for downstream tasks while maintaining their generalization remains a challenge. I
Externí odkaz:
http://arxiv.org/abs/2401.02418
Autor:
Hassan, Jameel, Gani, Hanan, Hussein, Noor, Khattak, Muhammad Uzair, Naseer, Muzammal, Khan, Fahad Shahbaz, Khan, Salman
The promising zero-shot generalization of vision-language models such as CLIP has led to their adoption using prompt learning for numerous downstream tasks. Previous works have shown test-time prompt tuning using entropy minimization to adapt text pr
Externí odkaz:
http://arxiv.org/abs/2311.01459
Autor:
Khattak, Muhammad Uzair, Wasim, Syed Talal, Naseer, Muzammal, Khan, Salman, Yang, Ming-Hsuan, Khan, Fahad Shahbaz
Prompt learning has emerged as an efficient alternative for fine-tuning foundational models, such as CLIP, for various downstream tasks. Conventionally trained using the task-specific objective, i.e., cross-entropy loss, prompts tend to overfit downs
Externí odkaz:
http://arxiv.org/abs/2307.06948
Autor:
Wasim, Syed Talal, Khattak, Muhammad Uzair, Naseer, Muzammal, Khan, Salman, Shah, Mubarak, Khan, Fahad Shahbaz
Recent video recognition models utilize Transformer models for long-range spatio-temporal context modeling. Video transformer designs are based on self-attention that can model global context at a high computational cost. In comparison, convolutional
Externí odkaz:
http://arxiv.org/abs/2307.06947
Large-scale multi-modal training with image-text pairs imparts strong generalization to CLIP model. Since training on a similar scale for videos is infeasible, recent approaches focus on the effective transfer of image-based CLIP to the video domain.
Externí odkaz:
http://arxiv.org/abs/2212.03640
Pre-trained vision-language (V-L) models such as CLIP have shown excellent generalization ability to downstream tasks. However, they are sensitive to the choice of input text prompts and require careful selection of prompt templates to perform well.
Externí odkaz:
http://arxiv.org/abs/2210.03117
Existing open-vocabulary object detectors typically enlarge their vocabulary sizes by leveraging different forms of weak supervision. This helps generalize to novel objects at inference. Two popular forms of weak-supervision used in open-vocabulary d
Externí odkaz:
http://arxiv.org/abs/2207.03482
Akademický článek
Tento výsledek nelze pro nepřihlášené uživatele zobrazit.
K zobrazení výsledku je třeba se přihlásit.
K zobrazení výsledku je třeba se přihlásit.