RS-LLaVA: A Large Vision-Language Model for Joint Captioning and Question Answering in Remote Sensing Imagery

Autor:	Yakoub Bazi, Laila Bashmal, Mohamad Mahmoud Al Rahhal, Riccardo Ricci, Farid Melgani
Jazyk:	angličtina
Rok vydání:	2024
Předmět:	remote sensing (RS) large language models (LLMs) Large Language and Vision Assistant Model (LLaVA) instruction tuning captioning visual question answering (VQA) Science
Zdroj:	Remote Sensing, Vol 16, Iss 9, p 1477 (2024)
Druh dokumentu:	article
ISSN:	16091477 2072-4292
DOI:	10.3390/rs16091477
Popis:	In this paper, we delve into the innovative application of large language models (LLMs) and their extension, large vision-language models (LVLMs), in the field of remote sensing (RS) image analysis. We particularly emphasize their multi-tasking potential with a focus on image captioning and visual question answering (VQA). In particular, we introduce an improved version of the Large Language and Vision Assistant Model (LLaVA), specifically adapted for RS imagery through a low-rank adaptation approach. To evaluate the model performance, we create the RS-instructions dataset, a comprehensive benchmark dataset that integrates four diverse single-task datasets related to captioning and VQA. The experimental results confirm the model’s effectiveness, marking a step forward toward the development of efficient multi-task models for RS image analysis.
Databáze:	Directory of Open Access Journals
Externí odkaz:	https://doaj.org/article/bbac3265f861430b919250d8f8f2e523 Zobrazit plný text záznamu View record in DOAJ Plný text ve formátu PDF Plný text ve formátu HTML
Nepřihlášeným uživatelům se plný text nezobrazuje	K zobrazení výsledku je třeba se přihlásit.