Zobrazeno 1 - 10
of 2 983
pro vyhledávání: '"AWAIS, MUHAMMAD"'
Audio-driven talking face generation is a challenging task in digital communication. Despite significant progress in the area, most existing methods concentrate on audio-lip synchronization, often overlooking aspects such as visual quality, customiza
Externí odkaz:
http://arxiv.org/abs/2412.07754
Contrastive learning, a prominent approach to representation learning, traditionally assumes positive pairs are closely related samples (the same image or class) and negative pairs are distinct samples. We challenge this assumption by proposing to le
Externí odkaz:
http://arxiv.org/abs/2410.18200
Autor:
Awais, Muhammad, Alharthi, Ali Husain Salem Abdulla, Kumar, Amandeep, Cholakkal, Hisham, Anwer, Rao Muhammad
Significant progress has been made in advancing large multimodal conversational models (LMMs), capitalizing on vast repositories of image-text data available online. Despite this progress, these models often encounter substantial domain gaps, hinderi
Externí odkaz:
http://arxiv.org/abs/2410.08405
Autor:
Nawaz, Umair, Awais, Muhammad, Gani, Hanan, Naseer, Muzammal, Khan, Fahad, Khan, Salman, Anwer, Rao Muhammad
Capitalizing on vast amount of image-text data, large-scale vision-language pre-training has demonstrated remarkable zero-shot capabilities and has been utilized in several applications. However, models trained on general everyday web-crawled data of
Externí odkaz:
http://arxiv.org/abs/2410.01407
In most existing multi-view modeling scenarios, cross-view correspondence (CVC) between instances of the same target from different views, like paired image-text data, is a crucial prerequisite for effortlessly deriving a consistent representation. N
Externí odkaz:
http://arxiv.org/abs/2409.14882
Autor:
Hanif, Asif, Shamshad, Fahad, Awais, Muhammad, Naseer, Muzammal, Khan, Fahad Shahbaz, Nandakumar, Karthik, Khan, Salman, Anwer, Rao Muhammad
Medical foundation models are gaining prominence in the medical community for their ability to derive general representations from extensive collections of medical image-text pairs. Recent research indicates that these models are susceptible to backd
Externí odkaz:
http://arxiv.org/abs/2408.07440
Autor:
Li, Rongchang, Feng, Zhenhua, Xu, Tianyang, Li, Linze, Wu, Xiao-Jun, Awais, Muhammad, Atito, Sara, Kittler, Josef
Compositional actions consist of dynamic (verbs) and static (objects) concepts. Humans can easily recognize unseen compositions using the learned concepts. For machines, solving such a problem requires a model to recognize unseen actions composed of
Externí odkaz:
http://arxiv.org/abs/2407.06113
Autor:
Awais, Muhammad, Hameed, Mehaboobathunnisa Sahul, Bhattacharya, Bidisha, Reiner, Orly, Anwer, Rao Muhammad
Recent advances have enabled the study of human brain development using brain organoids derived from stem cells. Quantifying cellular processes like mitosis in these organoids offers insights into neurodevelopmental disorders, but the manual analysis
Externí odkaz:
http://arxiv.org/abs/2406.19556
Vision transformers combined with self-supervised learning have enabled the development of models which scale across large datasets for several downstream tasks like classification, segmentation and detection. The low-shot learning capability of thes
Externí odkaz:
http://arxiv.org/abs/2406.17460
Masked Image Modeling (MIM)-based models, such as SdAE, CAE, GreenMIM, and MixAE, have explored different strategies to enhance the performance of Masked Autoencoders (MAE) by modifying prediction, loss functions, or incorporating additional architec
Externí odkaz:
http://arxiv.org/abs/2406.17450