Výsledky vyhledávání - "Khattak, Muhammad Uzair"

Report

XDT-CXR: Investigating Cross-Disease Transferability in Zero-Shot Binary Classification of Chest X-Rays

Autor: Rahman, Umaima, Basu, Abhishek, Khattak, Muhammad Uzair, Rahman, Aniq Ur

This study explores the concept of cross-disease transferability (XDT) in medical imaging, focusing on the potential of binary classifiers trained on one disease to perform zero-shot classification on another disease affecting the same organ. Utilizi

Externí odkaz: http://arxiv.org/abs/2408.11493

Zobrazit plný text záznamu

Report

How Good is my Video LMM? Complex Video Reasoning and Robustness Evaluation Suite for Video-LMMs

Autor: Khattak, Muhammad Uzair, Naeem, Muhammad Ferjad, Hassan, Jameel, Naseer, Muzammal, Tombari, Federico, Khan, Fahad Shahbaz, Khan, Salman

Recent advancements in Large Language Models (LLMs) have led to the development of Video Large Multi-modal Models (Video-LMMs) that can handle a wide range of video understanding tasks. These models have the potential to be deployed in real-world app

Externí odkaz: http://arxiv.org/abs/2405.03690

Zobrazit plný text záznamu

Report

Learning to Prompt with Text Only Supervision for Vision-Language Models

Autor: Khattak, Muhammad Uzair, Naeem, Muhammad Ferjad, Naseer, Muzammal, Van Gool, Luc, Tombari, Federico

Foundational vision-language models such as CLIP are becoming a new paradigm in vision, due to their excellent generalization abilities. However, adapting these models for downstream tasks while maintaining their generalization remains a challenge. I

Externí odkaz: http://arxiv.org/abs/2401.02418

Zobrazit plný text záznamu

Report

Align Your Prompts: Test-Time Prompting with Distribution Alignment for Zero-Shot Generalization

Autor: Hassan, Jameel, Gani, Hanan, Hussein, Noor, Khattak, Muhammad Uzair, Naseer, Muzammal, Khan, Fahad Shahbaz, Khan, Salman

The promising zero-shot generalization of vision-language models such as CLIP has led to their adoption using prompt learning for numerous downstream tasks. Previous works have shown test-time prompt tuning using entropy minimization to adapt text pr

Externí odkaz: http://arxiv.org/abs/2311.01459

Zobrazit plný text záznamu

Report

Self-regulating Prompts: Foundational Model Adaptation without Forgetting

Autor: Khattak, Muhammad Uzair, Wasim, Syed Talal, Naseer, Muzammal, Khan, Salman, Yang, Ming-Hsuan, Khan, Fahad Shahbaz

Prompt learning has emerged as an efficient alternative for fine-tuning foundational models, such as CLIP, for various downstream tasks. Conventionally trained using the task-specific objective, i.e., cross-entropy loss, prompts tend to overfit downs

Externí odkaz: http://arxiv.org/abs/2307.06948

Zobrazit plný text záznamu

Report

Video-FocalNets: Spatio-Temporal Focal Modulation for Video Action Recognition

Autor: Wasim, Syed Talal, Khattak, Muhammad Uzair, Naseer, Muzammal, Khan, Salman, Shah, Mubarak, Khan, Fahad Shahbaz

Recent video recognition models utilize Transformer models for long-range spatio-temporal context modeling. Video transformer designs are based on self-attention that can model global context at a high computational cost. In comparison, convolutional

Externí odkaz: http://arxiv.org/abs/2307.06947

Zobrazit plný text záznamu

Report

Fine-tuned CLIP Models are Efficient Video Learners

Autor: Rasheed, Hanoona, Khattak, Muhammad Uzair, Maaz, Muhammad, Khan, Salman, Khan, Fahad Shahbaz

Large-scale multi-modal training with image-text pairs imparts strong generalization to CLIP model. Since training on a similar scale for videos is infeasible, recent approaches focus on the effective transfer of image-based CLIP to the video domain.

Externí odkaz: http://arxiv.org/abs/2212.03640

Zobrazit plný text záznamu

Report

MaPLe: Multi-modal Prompt Learning

Autor: Khattak, Muhammad Uzair, Rasheed, Hanoona, Maaz, Muhammad, Khan, Salman, Khan, Fahad Shahbaz

Pre-trained vision-language (V-L) models such as CLIP have shown excellent generalization ability to downstream tasks. However, they are sensitive to the choice of input text prompts and require careful selection of prompt templates to perform well.

Externí odkaz: http://arxiv.org/abs/2210.03117

Zobrazit plný text záznamu

Report

Bridging the Gap between Object and Image-level Representations for Open-Vocabulary Detection

Autor: Rasheed, Hanoona, Maaz, Muhammad, Khattak, Muhammad Uzair, Khan, Salman, Khan, Fahad Shahbaz

Existing open-vocabulary object detectors typically enlarge their vocabulary sizes by leveraging different forms of weak supervision. This helps generalize to novel objects at inference. Two popular forms of weak-supervision used in open-vocabulary d

Externí odkaz: http://arxiv.org/abs/2207.03482

Zobrazit plný text záznamu

Akademický článek

Tento výsledek nelze pro nepřihlášené uživatele zobrazit.
K zobrazení výsledku je třeba se přihlásit.

Vyhledávací nástroje:

Upřesnit hledání