Výsledky vyhledávání - "Anwer, Rao M."

Report

CAMEL-Bench: A Comprehensive Arabic LMM Benchmark

Autor: Ghaboura, Sara, Heakl, Ahmed, Thawakar, Omkar, Alharthi, Ali, Riahi, Ines, Saif, Abduljalil, Laaksonen, Jorma, Khan, Fahad S., Khan, Salman, Anwer, Rao M.

Recent years have witnessed a significant interest in developing large multimodal models (LMMs) capable of performing various visual reasoning and understanding tasks. This has led to the introduction of multiple LMM benchmarks to evaluate LMMs on di

Externí odkaz: http://arxiv.org/abs/2410.18976

Zobrazit plný text záznamu

Report

MobiLlama: Towards Accurate and Lightweight Fully Transparent GPT

Autor: Thawakar, Omkar, Vayani, Ashmal, Khan, Salman, Cholakal, Hisham, Anwer, Rao M., Felsberg, Michael, Baldwin, Tim, Xing, Eric P., Khan, Fahad Shahbaz

"Bigger the better" has been the predominant trend in recent Large Language Models (LLMs) development. However, LLMs do not suit well for scenarios that require on-device processing, energy efficiency, low memory footprint, and response efficiency. T

Externí odkaz: http://arxiv.org/abs/2402.16840

Zobrazit plný text záznamu

Report

PALO: A Polyglot Large Multimodal Model for 5B People

Autor: Maaz, Muhammad, Rasheed, Hanoona, Shaker, Abdelrahman, Khan, Salman, Cholakal, Hisham, Anwer, Rao M., Baldwin, Tim, Felsberg, Michael, Khan, Fahad S.

In pursuit of more inclusive Vision-Language Models (VLMs), this study introduces a Large Multilingual Multimodal Model called PALO. PALO offers visual reasoning capabilities in 10 major languages, including English, Chinese, Hindi, Spanish, French,

Externí odkaz: http://arxiv.org/abs/2402.14818

Zobrazit plný text záznamu

Report

GLaMM: Pixel Grounding Large Multimodal Model

Autor: Rasheed, Hanoona, Maaz, Muhammad, Mullappilly, Sahal Shaji, Shaker, Abdelrahman, Khan, Salman, Cholakkal, Hisham, Anwer, Rao M., Xing, Erix, Yang, Ming-Hsuan, Khan, Fahad S.

Large Multimodal Models (LMMs) extend Large Language Models to the vision domain. Initial LMMs used holistic images and text prompts to generate ungrounded textual responses. Recently, region-level LMMs have been used to generate visually grounded re

Externí odkaz: http://arxiv.org/abs/2311.03356

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání