Reconstruction of the Hanzi Normative Glyphs Database as a Dataset and its Integration with CHISE

Autor: MORIOKA, Tomohiko
Jazyk: japonština
Rok vydání: 2019
Zdroj: 東方學報. 94:320-284
ISSN: 0304-2448
Popis: This report describes an attempt to integrate the "CHISE" (Character Information Service Environment) character ontology and the "HNG" (Hanzi Normative Glyphs) database/dataset. The CHISE character ontology is a large-scale character ontology that includes 365, 000 character objects (1, 460, 000 triples) including Unicode characters, non-Unicode characters and their glyphs. It was developed for CHISE which is a character processing system not depended on character codes. The framework of CHISE is based on a graph storage named "CONCORD". We developed a Web service to display and edit objects of CONCORD, called "EST" (or "CHISE-wiki"). The CHISE character ontology uses the "Multiple Granularity Hanzi Structure Model" to support various glyphs and multiple unification granularity of Chinese characters. This model works fine for modern glyphs of Chinese characters. However, before we started the study to integrate CHISE and HNG, it was not clear that the model is sufficient for premodern Chinese characters. In addition, to design reasonable unification rules for each unification granularity, we need various glyph examples of Chinese characters. In these senses, the CHISE character ontology should integrate glyph database and/or glyph corpus. Therefore, we tried to integrate HNG and the CHISE character ontology. When viewed from the HNG side, this integration has the following significance. The original HNG web service had been stopped since the spring of 2015. Therefore, we applied research on the integration of CHISE and HNG, we provided HNG search function and data browsing function on the CHISE Web service. Although the difficulty of keeping databases on digital humanities for a long time has come to be recognized, it seems that the feasible method for restoring the database whose service has actually stopped and maintaining the data in the future is not yet well established. In this paper, we will outline the efforts on the HNG dataset, such as publication of dataset using distributed version control system (Git), provision of Git hosting service independent of URL of researcher's institution or platform provided by commercial companies, organization of dataset preservation association, and also discuss issues related to long-term preservation of databases.
Databáze: OpenAIRE