RNMFMDA: A Microbe-Disease Association Identification Method Based on Reliable Negative Sample Selection and Logistic Matrix Factorization With Neighborhood Regularization
Autor: | Guangyi Liu, Ling Shen, Liqian Zhou, Longjie Liao, Lihong Peng |
---|---|
Jazyk: | angličtina |
Rok vydání: | 2020 |
Předmět: |
Microbiology (medical)
Computer science reliable negative samples lcsh:QR1-502 Negative sample Disease Association random walk with restart computer.software_genre Microbiology Regularization (mathematics) lcsh:Microbiology Matrix decomposition 03 medical and health sciences Prediction methods Original Research 030304 developmental biology 0303 health sciences 030306 microbiology Random walk logistic matrix factorization with neighborhood regularization positive-unlabeled learning microbe-disease associations Data mining Experimental methods computer Information integration |
Zdroj: | Frontiers in Microbiology, Vol 11 (2020) Frontiers in Microbiology |
DOI: | 10.3389/fmicb.2020.592430/full |
Popis: | Microbes with abnormal levels have important impacts on the formation and development of various complex diseases. Identifying possible Microbe-Disease Associations (MDAs) helps to understand the mechanisms of complex diseases. However, experimental methods for MDA identification are costly and time-consuming. In this study, a new computational model, RNMFMDA, was developed to find possible MDAs. RNMFMDA contains two main processes. First, Reliable Negative MDA samples were selected based on Positive-Unlabeled (PU) learning and random walk with restart on the heterogeneous microbe-disease network. Second, Logistic Matrix Factorization with Neighborhood Regularization (LMFNR) was developed to compute the association probabilities for all microbe-disease pairs. To evaluate the performance of the proposed RNMFMDA method, we compared RNMFMDA with five state-of-the-art MDA prediction methods based on five-fold cross-validations on microbes, diseases, and MDAs. As a result, RNMFMDA obtained the best AUCs of 0.6332, 0.8669, and 0.9081, respectively for the three five-fold cross validations, significantly outperforming other models. The promising prediction performance may be attributed to the following three features: highly quality negative MDA sample selection, LMFNR-based MDA prediction model, and various biological information integration. In addition, a few predicted microbe-disease pairs with high association scores are worthy of further experimental validation. |
Databáze: | OpenAIRE |
Externí odkaz: |