Autor: |
Martin Glauer, Adel Memariani, Fabian Neuhaus, Till Mossakowski, Janna Hastings |
Rok vydání: |
2022 |
DOI: |
10.5281/zenodo.6023496 |
Popis: |
Supplementary data for our submission "Interpretable Ontology Extension in Chemistry". We present an approach towards ontology extension that uses structural information to train a transformer-based model that predicts new subsumption relations. The ELECTRA model has been pre-trained using a combination of molecules from the ChEBI ontology and a selection of molecules from the PubChem database (chebai/data/SWJpre/raw/smiles.txt). The resulting model has then been fine-truned on a selection of ChEBI classes. The trained model has then been applied to a set of previously unseen chemicals from PubChem (hazardous.txt). The resulting predictions have been used to extend the ChEBI ontology. The extended ontology can be found as an owl file in 'chebi-slim-extended.owl.gz' and as a plot in 'classif-hazardous.png'. The resulting ontology was inconsistent because some of the predicted subsumption relations violated disjointness axioms. Those subsumption relations have been removed ('chebi-slim-extended-fixed.owl.gz'). The README.md file describes how to reproduce our results. |
Databáze: |
OpenAIRE |
Externí odkaz: |
|