Machine learning-based identification of genetic interactions from heterogeneous gene expression profiles

Autor: Sanghyun Park, Jeongwoo Kim, Chihyun Park, Jung Rim Kim
Jazyk: angličtina
Rok vydání: 2018
Předmět:
0301 basic medicine
Decision Analysis
Computer science
Gene regulatory network
Gene Identification and Analysis
lcsh:Medicine
Gene Expression
Genetic Networks
computer.software_genre
Alzheimer's Disease
Interactome
Machine Learning
0302 clinical medicine
Databases
Genetic

Feature (machine learning)
Medicine and Health Sciences
Gene Regulatory Networks
lcsh:Science
Multidisciplinary
Applied Mathematics
Simulation and Modeling
Neurodegenerative Diseases
Random forest
Identification (information)
Neurology
Physical Sciences
Engineering and Technology
Management Engineering
Algorithms
Network Analysis
Research Article
Computer and Information Sciences
Machine learning
Research and Analysis Methods
03 medical and health sciences
Machine Learning Algorithms
Artificial Intelligence
Mental Health and Psychiatry
Genetics
Gene
business.industry
Gene Expression Profiling
lcsh:R
Decision Trees
Biology and Life Sciences
Computational Biology
Epistasis
Genetic

Decision Tree Learning
Gene expression profiling
030104 developmental biology
Epistasis
lcsh:Q
Dementia
Artificial intelligence
business
computer
030217 neurology & neurosurgery
Mathematics
Zdroj: PLoS ONE
PLoS ONE, Vol 13, Iss 7, p e0201056 (2018)
ISSN: 1932-6203
Popis: The identification of disease-related genes and disease mechanisms is an important research goal; many studies have approached this problem by analysing genetic networks based on gene expression profiles and interaction datasets. To construct a gene network, correlations or associations among pairs of genes must be obtained. However, when gene expression data are heterogeneous with high levels of noise for samples assigned to the same condition, it is difficult to accurately determine whether a gene pair represents a significant gene-gene interaction (GGI). In order to solve this problem, we proposed a random forest-based method to classify significant GGIs from gene expression data. To train the model, we defined novel feature sets and utilised various high-confidence interactome datasets to deduce the correct answer set from known disease-specific genes. Using Alzheimer's disease data, the proposed method showed remarkable accuracy, and the GGIs established in the analysis can be used to build a meaningful genetic network that can explain the mechanisms underlying Alzheimer's disease.
Databáze: OpenAIRE
Nepřihlášeným uživatelům se plný text nezobrazuje