NLIMED: Natural Language Interface for Model Entity Discovery in Biosimulation Model Repositories
Autor: | Yuda Munarko, Dewan M. Sarwar, Anand Rampadarath, Koray Atalag, John H. Gennari, Maxwell L. Neal, David P. Nickerson |
---|---|
Jazyk: | angličtina |
Rok vydání: | 2019 |
Předmět: |
Information retrieval
Parsing Natural language user interface Physiology CellML InformationSystems_INFORMATIONSTORAGEANDRETRIEVAL InformationSystems_DATABASEMANAGEMENT computer.file_format Ontology (information science) computer.software_genre Metadata Physiology (medical) SPARQL RDF Extract class computer |
DOI: | 10.1101/756304 |
Popis: | Motivation Semantic annotation is a crucial step to assure reusability and reproducibility of biosimulation models in biology and physiology. For this purpose, the COmputational Modeling in BIology NEtwork (COMBINE) community recommend the use of the Resource Description Framework (RDF). The RDF implementation provides the flexibility of model entity searching (e.g. flux of sodium across apical plasma membrane) by utilising SPARQL. However, the rigidity and complexity of SPARQL syntax and the nature of semantic annotation which is not merely as a simple triple yet forming a tree-like structure may cause a difficulty. Therefore, the availability of an interface to convert a natural language query to SPARQL is beneficial. Results We propose NLIMED, a natural language query to SPARQL interface to retrieve model entities from biosimulation models. Our interface can be applied to various repositories utilising RDF such as the PMR and Biomodels. We evaluate our interface by collecting RDF in the biosimulation models coded using CellML in PMR. First, we extract RDF as a tree structure and then store each subtree of a model entity as a modified triple of a model entity name, path, and class ontology into the RDF Graph Index. We also extract class ontology’s textual metadata from the BioPortal and CellML and manage it in the Text Feature Index. With the Text Feature Index, we annotate phrases resulted by the NLQ Parser (Stanford parser or NLTK parser) into class ontologies. Finally, the detected class ontologies then are composed as SPARQL by incorporating the RDF Graph Index. Our annotator performance is far more powerful compared to the available service provided by BioPortal with F-measure of 0.756 and our SPARQL composer can find all possible SPARQL in the collection based on the annotation results. Currently, we already implement our interface in Epithelial Modelling Platform tool. Availability https://github.com/napakalas/NLIMED |
Databáze: | OpenAIRE |
Externí odkaz: |