Adapting a Foundation Model for Space-based Tasks

Autor:	Foutter, Matthew, Bhoj, Praneet, Sinha, Rohan, Elhafsi, Amine, Banerjee, Somrita, Agia, Christopher, Kruger, Justin, Guffanti, Tommaso, Gammelli, Daniele, D'Amico, Simone, Pavone, Marco
Rok vydání:	2024
Předmět:	Computer Science - Robotics Computer Science - Artificial Intelligence
Druh dokumentu:	Working Paper
Popis:	Foundation models, e.g., large language models, possess attributes of intelligence which offer promise to endow a robot with the contextual understanding necessary to navigate complex, unstructured tasks in the wild. In the future of space robotics, we see three core challenges which motivate the use of a foundation model adapted to space-based applications: 1) Scalability of ground-in-the-loop operations; 2) Generalizing prior knowledge to novel environments; and 3) Multi-modality in tasks and sensor data. Therefore, as a first-step towards building a foundation model for space-based applications, we automatically label the AI4Mars dataset to curate a language annotated dataset of visual-question-answer tuples. We fine-tune a pretrained LLaVA checkpoint on this dataset to endow a vision-language model with the ability to perform spatial reasoning and navigation on Mars' surface. In this work, we demonstrate that 1) existing vision-language models are deficient visual reasoners in space-based applications, and 2) fine-tuning a vision-language model on extraterrestrial data significantly improves the quality of responses even with a limited training dataset of only a few thousand samples.
Databáze:	arXiv
Externí odkaz:	http://arxiv.org/abs/2408.05924 Zobrazit plný text záznamu View this record from Arxiv