Enabling semantic queries across federated bioinformatics databases
Autor: | Manuel Gil, Marc Robinson-Rechavi, Maria Anisimova, Christophe Dessimoz, Erich Zbinden, Heinz Stockinger, Ana Claudia Sima, Tarcisio Mendes de Farias, Kurt Stockinger |
---|---|
Jazyk: | angličtina |
Rok vydání: | 2019 |
Předmět: |
Query processing
Databases Factual Computer science Relational database Natural language interface Interoperability Biological database 02 engineering and technology 005: Computerprogrammierung Programme und Daten Ontology (information science) Semantic data model computer.software_genre Biological Ontologies Computational Biology Semantic Web 03 medical and health sciences 0202 electrical engineering electronic engineering information engineering SPARQL RDF 030304 developmental biology 0303 health sciences Biological data Semantic query Information retrieval Semantic web technology Original Articles computer.file_format Ontology Federated database 020201 artificial intelligence & image processing Data integration computer |
Zdroj: | Database: The Journal of Biological Databases and Curation Database, vol. 2019 |
Popis: | MotivationData integration promises to be one of the main catalysts in enabling new insights to be drawn from the wealth of biological data available publicly. However, the heterogeneity of the different data sources, both at the syntactic and the semantic level, still poses significant challenges for achieving interoperability among biological databases.ResultsWe introduce an ontology-based federated approach for data integration. We applied this approach to three heterogeneous data stores that span different areas of biological knowledge: 1) Bgee, a gene expression relational database; 2) OMA, a Hierarchical Data Format 5 (HDF5) orthology data store, and 3) UniProtKB, a Resource Description Framework (RDF) store containing protein sequence and functional information. To enable federated queries across these sources, we first defined a new semantic model for gene expression called GenEx. We then show how the relational data in Bgee can be expressed as a virtual RDF graph, instantiating GenEx, through dedicated relational-to-RDF mappings. By applying these mappings, Bgee data are now accessible through a public SPARQL endpoint. Similarly, the materialised RDF data of OMA, expressed in terms of the Orthology ontology, is made available in a public SPARQL endpoint. We identified and formally described intersection points (i.e. virtual links) among the three data sources. These allow performing joint queries across the data stores. Finally, we lay the groundwork to enable nontechnical users to benefit from the integrated data, by providing a natural language template-based search interface.Project URLhttp://biosoda.expasy.org, https://github.com/biosoda/bioquery |
Databáze: | OpenAIRE |
Externí odkaz: |