The Aggregation of ROAD Data in the ARIADNE Pipeline: Successes and Pitfalls

Autor: Kandel, Andrew, Kanaeva, Zara, Haidle, Miriam
Jazyk: angličtina
Rok vydání: 2022
Předmět:
DOI: 10.5281/zenodo.7328000
Popis: The ROCEEH Out of Africa Database (ROAD; www.roceeh.uni-tuebingen.de/roadweb/) contains data about archaeological, paleoanthropological, paleontological and paleobotanical localities in Africa and Eurasia spanning from three million to 20,000 years ago. The database was conceived in 2008 as the ROCEEH project (www.roceeh.net/) began, and data entry started in 2009. Since then, the multidisciplinary team has integrated over 2,200 localities containing more than 20,000 assemblages collected from over 4,700 publications written in English, French, German, Italian, Spanish, Portuguese, Russian and Chinese, among others. ROAD serves as a valuable resource for archaeologists and other paleoscientists because it contains vast amounts of information that can be explored using innovative methods in data science. ROAD is a relational database managed with a PostgreSQL database management system. The database allows user interaction through its application called ROADWeb, which is a web-based application written in .php, javascript and .html (Fig. 1). ROAD and its applications are hosted on a server located at the University of Tübingen. The ROCEEH team purposely chose to use open access software with the intention of increasing the database’s longevity. To make ROAD data more FAIR in the future, the research team is working to incorporate its data into the Semantic Web and Linked Data. Almost all data in the Semantic Web are distributed using Resource Description Framework (RDF), a highly interoperable standard developed by the World International Conference on Cultural Heritage and New Technologies | Vienna | 2022 2 Andrew W. KANDEL et al. Wide Web Consortium (W3C) to describe data or metadata. In 2021, the ROCEEH team completed the development of an RDF data model (i.e. ontology) and the RDF export of ROAD data. In accordance with overriding developments towards open science, ROCEEH registered ROAD with the repository re3data (www.re3data.org/), and published it under an open Creative Commons license (CC BY-SA 4.0). Based on our experience with data models, thesauri and data synthesis, we worked to promote sustainability of the database by developing standardized practices. Our work was complemented by networks of collaboration with ARIADNE, the Coalition for Archaeological Synthesis, and the German National Research Data Infrastructure (NFDI4Objects), among other agencies. ROCEEH first met with the ARIADNEplus team in Prato in January 2020, to plan out a timeline for data integration. After this, ROCEEH began to use ARIADNE’s data infrastructure (portal.ariadneinfrastructure.eu/) in order to map the data contained in ROAD onto ARIADNE’s scheme. With the help of standardized vocabularies such as the Getty Art & Architecture Thesaurus (AAT) and PeriodO, which stores our defined chrono-cultural entities, ROCEEH successfully completed the first round of data integration in September 2021 (Fig. 2). Since then, users are able to search ARIADNE to find the prehistoric data contained in ROAD, a function which enhances the use of both databases. The first update occurred in March 2022, and additional updates are planned every six months. In this presentation we report on some of the pitfalls and successes our team encountered as we tried to make ROAD data available in the ARIADNE portal. For example, one setback occurred when we tried to map ROAD attributes to those of ARIADNE using their 3M tool (Mapping Memory Manager). We could not bring the geological ages of finds in ROAD into ARIADNE’s graph database. The issue was that the model which describes the datasets contained in the ARIADNE catalog (AOCat), offered no appropriate resource class for establishing the geological age of the finds, while this feature was present in ROAD. Another setback occurred during the mapping phase, when we discovered that the Getty AAT lacked certain entries better suited for prehistoric artifacts and cultures. We had to homogenize ROAD data to overcome this. Another issue was the regionalization of ROAD’s cultural entities, as these did not conform well with those in PeriodO. We used alternative labels to solve this. Despite these setbacks, we succeeded in integrating ROAD data and continue to update ARIADNE periodically. We also highlight our ongoing efforts to make the data FAIR (findable, accessible, interoperable, reusable), a philosophy that has become increasingly important in securing the future of Big Data in science. This last topic dovetails nicely into another of ROCEEH’s successes, namely in making ROAD data findable through ARIADNE. Finally, we touch upon some of the recent advances the research team made with regard to the database, and expound briefly on the way in which the team innovated methods, designed applications, developed products and gained perspectives, as these issues may have relevance for the other partners of ARIADNE. To explore the full potential of ROAD and ARIADNE, we encourage you to visit our respective websites (www.roceeh.uni-tuebingen.de/roadweb/ and portal.ariadne-infrastructure.eu/) to discover what else these databases have to offer. Should you wish to explore ROAD further, ROCEEH provides expanded access for anyone interested.
{"references":["Bolus, M., A. Bruch, M. Haidle, C. Hertler, J. Heß, Z. Kanaeva, A. Kandel, M. Malina & C. Sommer (2020). Explore the History of Humanity with the new ROAD Summary Data Sheet / Durch die Menschheitsgeschichte mit dem neuen ROAD Summary Data Sheet. Mitteilungen der Gesellschaft für Urgeschichte 29, 145-147. https://doi.org/10.51315/mgfu.2020.29008","Haidle, M.N., M. Bolus, A.A. Bruch, C. Hertler, V. Hochschild, Z. Kanaeva, C. Sommer & A.W. Kandel (2020). Human Origins – Digital Future, an International Conference about the Future of Archaeological and Paleoanthropological Databases (Summary). Evolutionary Anthropology 29, 289-292. https://doi.org/10.1002/evan.21870","Kandel, A.W., M.H. Haidle & C. Sommer (Eds.) In press. Human Origins – Digital Future: An International Conference about the Future of Archaeological and Paleoanthropological Databases, 91 pp. Heidelberg, Propylaeum. https://doi.org/10.11588/propylaeum.882"]}
Databáze: OpenAIRE