OpenBiodiv Computer Demo: an Implementation of a Semantic System Running on top of the Biodiversity Knowledge Graph

Autor:	Viktor Senderov, Teodor Georgiev, Donat Agosti, Terry Catapano, Guido Sautter, Éamonn Ó Tuama, Nico Franz, Kiril Simov, Pavel Stoev, Lyubomir Penev
Jazyk:	angličtina
Rok vydání:	2017
Předmět:	0106 biological sciences inference Text Mining Semantic publishing R package 02 engineering and technology General Medicine 010603 evolutionary biology 01 natural sciences SPARQL RDF Artificial Intelligence 020204 information systems 0202 electrical engineering electronic engineering information engineering Biodiversity Knowledge Graph Linked Open Data Semantic web
Zdroj:	Biodiversity Information Science and Standards 1: e20193 Proceedings of TDWG
ISSN:	2535-0897
Popis:	We present OpenBiodiv - an implementation of the Open Biodiversity Knowledge Management System. We believe OpenBiodiv is possibly the first pilot-stage implenatation of a semantic system running on top of the biodiversity knowledge graph. The need for an integrated information system serving the needs of the biodiversity community can be dated at least as far back as the sanctioning of the Bouchout declaration in 2007. The Bouchout declaration proposes to make biodiversity knowledge freely available as Linked Open Data (LOD)1. At TDWG2016 (Fig. 1) we presented the prototype of the sytem - then called Open Biodiversity Knolwedge Management System (OBKMS). The specification and design of OpenBiodiv was outlined by Senderov and Penev (2016) and in this computer demo we would like to showcase its pilot. We will show how to use the SPARQL2 endpoint directly, we will illustrate the semantic search capabilities of the system, and we will showcase some high-level applications that run on top of it. We will also look at the core dataset (the Biodiversity Knowledge Graph) and the R tools used to create it. OpenBiodiv has several components: OpenBiodiv ontology: general data model allowing the extraction of biodiversity knowledge from taxonomic articles or from databases such as GBIF. The ontology (in preparation, Journal of Biomedical Semantics, available on GitHub) incorporates several pre-existing models: Darwin-SW (Baskauf and Webb 2016), SPAR (Peroni 2014), Treatment Ontology, and several others. It defines classes, properties, and rules allowing to interlink these disparate ontologies and to create a LOD of biodiversity knowledge. New is the Taxonomic Name Usage class, accompanied by a Vocabulary of Taxonomic Statuses (created via an analysis of 4,000 Pensoft articles) allowing for the automated inference of the taxonomic status of Latinized scientific names. The ontology allows for multiple backbone taxonomies via the introduction of a Taxon Concept class (equivalent to DarwinCore Taxon) and Taxon Concept Labels as a subclass of biological name. The Biodiversity Knowledge Graph - a LOD dataset of information extracted from taxonomic literature and databases. In practice, it has realized part of what has been proposed during pro-iBiosphere and later discussed by Page (2016). Its main resources are articles, sub-article componets (tables, figures, treatents, references), author names, institution names, geographical locations, biological names, taxon concepts, and occurrences. Authors have been disambiguated via their affiliation with the use of fuzzy-logic based on the GraphDB Lucene connector. The graph interlinks: (1) Prospectively published literature via Pensoft Publishers. (2) Legacy literature via Plazi. (3) Well-known resources such as geographical places or institutions via DBPedia. (4) GBIF's backbone taxonomy as a default but not preferential hierarchy of taxon concepts. (5) OpenBiodiv identifiers are matched to nomenclator identifiers (e.g. ZooBank) whenever possible. Names form two networks in the graph: (1) A directed-acyclical graph (DAG) of supercedence that can be followed to the corresponding sinks to infer the currently applicable scientific name for a given taxon. (2) A network of bi-directional relations indicating the relatedness of names. These names may be compared to the related names inferred on the basis of distributional semantics by the co-organizers of this workshop (Nguyen et al. 2017). ropenbio: an R package for RDF*3-ization of biodiversity information resources according to the OpenBiodiv ontology. It will be submitted to the rOpenSci project. While many of its high-level functions are specific to OpenBiodiv, the low-level functions, and its RDF-ization framework can be used for any R-based RDF-ization effort. OpenBiodiv.net: a front-end of the system allowing users to run low-level SPARQL queries as well to use an extensible set of semantic apps running on top of the Biodiversity Knowledge Graph.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::ce71a50932932d58e5807a9911b78c2d https://doi.org/10.3897/tdwgproceedings.1.20193 Zobrazit plný text záznamu