A step towards quantifying, modelling and exploring uncertainty in biomedical knowledge graphs.

Autor: Bahaj A; International University of Rabat, TICLab, Sala el Jadida 11103, Morocco. Electronic address: adil.bahaj@uir.ac.ma., Ghogho M; International University of Rabat, TICLab, Sala el Jadida 11103, Morocco; University of Leeds, Faculty of Engineering, University of Leeds, Leeds LS2 9JT, UK. Electronic address: mounir.ghogho@uir.ac.ma.
Jazyk: angličtina
Zdroj: Computers in biology and medicine [Comput Biol Med] 2024 Nov 13; Vol. 184, pp. 109355. Date of Electronic Publication: 2024 Nov 13.
DOI: 10.1016/j.compbiomed.2024.109355
Abstrakt: Objective: This study aims at automatically quantifying and modelling the uncertainty of facts in biomedical knowledge graphs (BKGs) based on their textual supporting evidence using deep learning techniques.
Materials and Methods: A sentence transformer is employed to extract deep features of sentences used to classify sentence factuality using a naive Bayes classifier. For each fact and its supporting evidence in a source KG, the deep feature extractor and the classifier are used to quantify the factuality of each sentence which are then transformed to numerical values in [0,1] before being averaged to get the confidence score of the fact.
Results: The fact classification feature extractor enhances the separability of classes in the embedding space. This helped the fact classification model to achieve a better performance than existing factuality classification with hand-crafted features. Uncertainty quantification and modelling were demonstrated on SemMedDB by creating USemMedDB, showing KGB2U's ability to process large BKGs. A subset of USemMedDB facts is modelled to demonstrate the correlation between the structure of the uncertain BKG and the confidence scores. The best-trained model is used to predict confidence scores of existing and unseen facts. The top-ranked unseen facts were grounded using scientific evidence showing KGB2U's ability to discover new knowledge.
Conclusion: Supporting literature of BKG facts can be used to automatically quantify their uncertainty. Additionally, the resulting uncertain biomedical KGs can be used for knowledge discovery. BKG2U interface and source code are available at http://biofunk.datanets.org/ and https://github.com/BahajAdil/KBG2U respectively.
Competing Interests: Declaration of competing interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
(Copyright © 2024 Elsevier Ltd. All rights reserved.)
Databáze: MEDLINE