TopoFun: a machine learning method to improve the functional similarity of gene co-expression modules
Autor: | Robert Sabatier, Hassan Zeineddine, Christelle Reynes, Ali Janbain, Zainab Assaghir, Laurent Journot |
---|---|
Přispěvatelé: | Guerineau, Nathalie C., Institut de Génomique Fonctionnelle (IGF), Institut National de la Santé et de la Recherche Médicale (INSERM)-Université de Montpellier (UM)-Centre National de la Recherche Scientifique (CNRS), Lebanese University [Beirut] (LU), Université de Montpellier (UM)-Université Montpellier 1 (UM1)-Institut National de la Santé et de la Recherche Médicale (INSERM)-Université Montpellier 2 - Sciences et Techniques (UM2)-Centre National de la Recherche Scientifique (CNRS), Institut de Génomique Fonctionnelle - Montpellier GenomiX (IGF MGX), Université de Montpellier (UM)-Université Montpellier 1 (UM1)-Institut National de la Santé et de la Recherche Médicale (INSERM)-Université Montpellier 2 - Sciences et Techniques (UM2)-Centre National de la Recherche Scientifique (CNRS)-Université de Montpellier (UM)-Université Montpellier 1 (UM1)-Institut National de la Santé et de la Recherche Médicale (INSERM)-Université Montpellier 2 - Sciences et Techniques (UM2)-Centre National de la Recherche Scientifique (CNRS) |
Jazyk: | angličtina |
Rok vydání: | 2021 |
Předmět: |
AcademicSubjects/SCI01140
AcademicSubjects/SCI01060 Computer science [SDV]Life Sciences [q-bio] AcademicSubjects/SCI00030 Machine learning computer.software_genre AcademicSubjects/SCI01180 Set (abstract data type) 03 medical and health sciences 0302 clinical medicine Lasso (statistics) [SDV.BBM.GTP]Life Sciences [q-bio]/Biochemistry Molecular Biology/Genomics [q-bio.GN] Genetic algorithm Gene 030304 developmental biology 0303 health sciences Fitness function [SDV.BIBS] Life Sciences [q-bio]/Quantitative Methods [q-bio.QM] business.industry Gene Annotation Expression (computer science) Linear discriminant analysis [SDV.BIBS]Life Sciences [q-bio]/Quantitative Methods [q-bio.QM] Methart ComputingMethodologies_PATTERNRECOGNITION [SDV.BBM.GTP] Life Sciences [q-bio]/Biochemistry Molecular Biology/Genomics [q-bio.GN] Artificial intelligence AcademicSubjects/SCI00980 ComputingMethodologies_GENERAL business computer 030217 neurology & neurosurgery |
Zdroj: | NAR Genomics and Bioinformatics NAR Genomics and Bioinformatics, 2021, 3 (4), pp.lqab103. ⟨10.1093/nargab/lqab103⟩ NAR Genomics and Bioinformatics, Oxford University Press, 2021, 3 (4), pp.lqab103. ⟨10.1093/nargab/lqab103⟩ |
ISSN: | 2631-9268 |
Popis: | A comprehensive, accurate functional annotation of genes is key to systems-level approaches. As functionally related genes tend to be co-expressed, one possible approach to identify functional modules or supplement existing gene annotations is to analyse gene co-expression. We describe TopoFun, a machine learning method that combines topological and functional information to improve the functional similarity of gene co-expression modules. Using LASSO, we selected topological descriptors that discriminated modules made of functionally related genes and random modules. Using the selected topological descriptors, we performed linear discriminant analysis to construct a topological score that predicted the type of a module, random-like or functional-like. We combined the topological score with a functional similarity score in a fitness function that we used in a genetic algorithm to explore the co-expression network. To illustrate the use of TopoFun, we started from a subset of the Gene Ontology Biological Processes (GO-BPs) and showed that TopoFun efficiently retrieved genes that we omitted, and aggregated a number of novel genes to the initial GO-BP while improving module topology and functional similarity. Using an independent protein-protein interaction database, we confirmed that the novel genes gathered by TopoFun were functionally related to the original gene set. |
Databáze: | OpenAIRE |
Externí odkaz: |