Zobrazeno 1 - 10
of 1 519
pro vyhledávání: '"KHAN, FAHAD"'
Dichotomous Image Segmentation (DIS) tasks require highly precise annotations, and traditional dataset creation methods are labor intensive, costly, and require extensive domain expertise. Although using synthetic data for DIS is a promising solution
Externí odkaz:
http://arxiv.org/abs/2412.19080
Autor:
Fu, Dingjie, Hou, Wenjin, Chen, Shiming, Chen, Shuhuang, You, Xinge, Khan, Salman, Khan, Fahad Shahbaz
Generative Zero-Shot Learning (ZSL) methods synthesize class-related features based on predefined class semantic prototypes, showcasing superior performance. However, this feature generation paradigm falls short of providing interpretable insights. I
Externí odkaz:
http://arxiv.org/abs/2412.17219
Autor:
Soni, Sagar, Dudhane, Akshay, Debary, Hiyam, Fiaz, Mustansar, Munir, Muhammad Akhtar, Danish, Muhammad Sohail, Fraccaro, Paolo, Watson, Campbell D, Klein, Levente J, Khan, Fahad Shahbaz, Khan, Salman
Automated analysis of vast Earth observation data via interactive Vision-Language Models (VLMs) can unlock new opportunities for environmental monitoring, disaster response, and resource management. Existing generic VLMs do not perform well on Remote
Externí odkaz:
http://arxiv.org/abs/2412.15190
Autor:
Khattak, Muhammad Uzair, Kunhimon, Shahina, Naseer, Muzammal, Khan, Salman, Khan, Fahad Shahbaz
Vision-Language Models (VLMs) trained via contrastive learning have achieved notable success in natural image tasks. However, their application in the medical domain remains limited due to the scarcity of openly accessible, large-scale medical image-
Externí odkaz:
http://arxiv.org/abs/2412.10372
Autor:
Mullappilly, Sahal Shaji, Kurpath, Mohammed Irfan, Pieri, Sara, Alseiari, Saeed Yahya, Cholakkal, Shanavas, Aldahmani, Khaled, Khan, Fahad, Anwer, Rao, Khan, Salman, Baldwin, Timothy, Cholakkal, Hisham
This paper introduces BiMediX2, a bilingual (Arabic-English) Bio-Medical EXpert Large Multimodal Model (LMM) with a unified architecture that integrates text and visual modalities, enabling advanced image understanding and medical applications. BiMed
Externí odkaz:
http://arxiv.org/abs/2412.07769
Autor:
Croitoru, Florinel-Alin, Hiji, Andrei-Iulian, Hondru, Vlad, Ristea, Nicolae Catalin, Irofti, Paul, Popescu, Marius, Rusu, Cristian, Ionescu, Radu Tudor, Khan, Fahad Shahbaz, Shah, Mubarak
With the recent advancements in generative modeling, the realism of deepfake content has been increasing at a steady pace, even reaching the point where people often fail to detect manipulated media content online, thus being deceived into various ki
Externí odkaz:
http://arxiv.org/abs/2411.19537
Autor:
Danish, Muhammad Sohail, Munir, Muhammad Akhtar, Shah, Syed Roshaan Ali, Kuckreja, Kartik, Khan, Fahad Shahbaz, Fraccaro, Paolo, Lacoste, Alexandre, Khan, Salman
While numerous recent benchmarks focus on evaluating generic Vision-Language Models (VLMs), they fall short in addressing the unique demands of geospatial applications. Generic VLM benchmarks are not designed to handle the complexities of geospatial
Externí odkaz:
http://arxiv.org/abs/2411.19325
Autor:
Vayani, Ashmal, Dissanayake, Dinura, Watawana, Hasindri, Ahsan, Noor, Sasikumar, Nevasini, Thawakar, Omkar, Ademtew, Henok Biadglign, Hmaiti, Yahya, Kumar, Amandeep, Kuckreja, Kartik, Maslych, Mykola, Ghallabi, Wafa Al, Mihaylov, Mihail, Qin, Chao, Shaker, Abdelrahman M, Zhang, Mike, Ihsani, Mahardika Krisna, Esplana, Amiel, Gokani, Monil, Mirkin, Shachar, Singh, Harsh, Srivastava, Ashay, Hamerlik, Endre, Izzati, Fathinah Asma, Maani, Fadillah Adamsyah, Cavada, Sebastian, Chim, Jenny, Gupta, Rohit, Manjunath, Sanjay, Zhumakhanova, Kamila, Rabevohitra, Feno Heriniaina, Amirudin, Azril, Ridzuan, Muhammad, Kareem, Daniya, More, Ketan, Li, Kunyang, Shakya, Pramesh, Saad, Muhammad, Ghasemaghaei, Amirpouya, Djanibekov, Amirbek, Azizov, Dilshod, Jankovic, Branislava, Bhatia, Naman, Cabrera, Alvaro, Obando-Ceron, Johan, Otieno, Olympiah, Farestam, Fabian, Rabbani, Muztoba, Baliah, Sanoojan, Sanjeev, Santosh, Shtanchaev, Abduragim, Fatima, Maheen, Nguyen, Thao, Kareem, Amrin, Aremu, Toluwani, Xavier, Nathan, Bhatkal, Amit, Toyin, Hawau, Chadha, Aman, Cholakkal, Hisham, Anwer, Rao Muhammad, Felsberg, Michael, Laaksonen, Jorma, Solorio, Thamar, Choudhury, Monojit, Laptev, Ivan, Shah, Mubarak, Khan, Salman, Khan, Fahad
Existing Large Multimodal Models (LMMs) generally focus on only a few regions and languages. As LMMs continue to improve, it is increasingly important to ensure they understand cultural contexts, respect local sensitivities, and support low-resource
Externí odkaz:
http://arxiv.org/abs/2411.16508
Autor:
Chen, Dubing, Fang, Jin, Han, Wencheng, Cheng, Xinjing, Yin, Junbo, Xu, Chenzhong, Khan, Fahad Shahbaz, Shen, Jianbing
Vision-based semantic occupancy and flow prediction plays a crucial role in providing spatiotemporal cues for real-world tasks, such as autonomous driving. Existing methods prioritize higher accuracy to cater to the demands of these tasks. In this wo
Externí odkaz:
http://arxiv.org/abs/2411.07725
Autor:
Hu, Taihang, Li, Linxuan, van de Weijer, Joost, Gao, Hongcheng, Khan, Fahad Shahbaz, Yang, Jian, Cheng, Ming-Ming, Wang, Kai, Wang, Yaxing
Although text-to-image (T2I) models exhibit remarkable generation capabilities, they frequently fail to accurately bind semantically related objects or attributes in the input prompts; a challenge termed semantic binding. Previous approaches either i
Externí odkaz:
http://arxiv.org/abs/2411.07132