GeoKnowledgeFusion: A Platform for Multimodal Data Compilation from Geoscience Literature

Autor: Zhixin Guo, Chaoyang Wang, Jianping Zhou, Guanjie Zheng, Xinbing Wang, Chenghu Zhou
Jazyk: angličtina
Rok vydání: 2024
Předmět:
Zdroj: Remote Sensing, Vol 16, Iss 9, p 1484 (2024)
Druh dokumentu: article
ISSN: 2072-4292
DOI: 10.3390/rs16091484
Popis: With the advent of big data science, the field of geoscience has undergone a paradigm shift toward data-driven scientific discovery. However, the abundance of geoscience data distributed across multiple sources poses significant challenges to researchers in terms of data compilation, which includes data collection, collation, and database construction. To streamline the data compilation process, we present GeoKnowledgeFusion, a publicly accessible platform for the fusion of text, visual, and tabular knowledge extracted from the geoscience literature. GeoKnowledgeFusion leverages a powerful network of models that provide a joint multimodal understanding of text, image, and tabular data, enabling researchers to efficiently curate and continuously update their databases. To demonstrate the practical applications of GeoKnowledgeFusion, we present two scenarios: the compilation of Sm-Nd isotope data for constructing a domain-specific database and geographic analysis, and the data extraction process for debris flow disasters. The data compilation process for these use cases encompasses various tasks, including PDF pre-processing, target element recognition, human-in-the-loop annotation, and joint multimodal knowledge understanding. The findings consistently reveal patterns that align with manually compiled data, thus affirming the credibility and dependability of our automated data processing tool. To date, GeoKnowledgeFusion has supported forty geoscience research teams within the program by processing over 40,000 documents uploaded by geoscientists.
Databáze: Directory of Open Access Journals
Nepřihlášeným uživatelům se plný text nezobrazuje