The Japanese academic dataset integration based on PID and text processing

Autor: Onami, Jun-ichi, Kanazawa, Teruhito, Ohmukai, Ikki, Kawai, Masashi, Yamaji, Kazutsuna
Jazyk: angličtina
Rok vydání: 2023
Předmět:
DOI: 10.5281/zenodo.8091586
Popis: The academic discovery infrastructure is becoming essential to researchers for the proof of transparency of the research and the promotion of data-driven science. In April 2021, we published a new Japanese academic discovery service, CiNii Research. This search service includes a variety of non-traditional research outputs in the index. We processed the internal data with the following unique algorithm so that users can search integrated academic information efficiently. CiNii Research represents rich search results, including various research outputs in one page with these structured datasets. On the data processing, we utilized PID identification, text matching, and ID mapping to remove the duplicated academic information with unique prioritization. Thirty-nine percent of instances have been duplicated and reduced with these processes. In addition, each instance is connected by using the information on citation relationships and parent-child relationships. We were able to add more than 60 million link information to 30 million instances with academic resource data integration. These dataset improvements will contribute to the richness of the search result.
Databáze: OpenAIRE