A novel curated scholarly graph connecting textual and data publications

Autor:	Ornella Irrera, Andrea Mannocci, Paolo Manghi, Gianmaria Silvello
Jazyk:	angličtina
Rok vydání:	2023
Předmět:	Information Systems and Management Data curation Scholarly knowledge graphs Data enrichment Open science Information Systems Dataset
Zdroj:	ACM journal of data and information quality (Online) (2023). doi:10.1145/3597310
DOI:	10.1145/3597310
Popis:	In the last decade, scholarly graphs became fundamental to storing and managing scholarly knowledge in a structured and machine-readable way. Methods and tools for discovery and impact assessment of science rely on such graphs and their quality to serve scientists, policymakers, and publishers. Since research data became very important in scholarly communication, scholarly graphs started including dataset metadata and their relationships to publications. Such graphs are the foundations for Open Science investigations, data-article publishing workflows, discovery, and assessment indicators. However, due to the heterogeneity of practices (FAIRness is indeed in the making), they often lack the complete and reliable metadata necessary to perform accurate data analysis; e.g., dataset metadata is inaccurate, author names are not uniform, and the semantics of the relationships is unknown, ambiguous or incomplete. This work describes an open and curated scholarly graph we built and published as a training and test set for data discovery, data connection, author disambiguation, and link prediction tasks. Overall the graph contains 4,047 publications, 5,488 datasets, 22 software, 21,561 authors; 9,692 edges interconnect publications to datasets and software and are labeled with semantics that outline whether a publication is citing, referencing, documenting , supplementing another product. To ensure high-quality metadata and semantics, we relied on the information extracted from PDFs of the publications and the datasets and software webpages to curate and enrich nodes metadata and edges semantics. To the best of our knowledge, this is the first ever published resource, including publications and datasets with manually validated and curated metadata.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::ffdef94356bb63e8839695a4942d7bc2 https://openportal.isti.cnr.it/doc?id=people______::22cd98b13e64f04f0887be9c98384a2c Zobrazit plný text záznamu