TopoFun: a machine learning method to improve the functional similarity of gene co-expression modules

Autor: Robert Sabatier, Hassan Zeineddine, Christelle Reynes, Ali Janbain, Zainab Assaghir, Laurent Journot
Přispěvatelé: Guerineau, Nathalie C., Institut de Génomique Fonctionnelle (IGF), Institut National de la Santé et de la Recherche Médicale (INSERM)-Université de Montpellier (UM)-Centre National de la Recherche Scientifique (CNRS), Lebanese University [Beirut] (LU), Université de Montpellier (UM)-Université Montpellier 1 (UM1)-Institut National de la Santé et de la Recherche Médicale (INSERM)-Université Montpellier 2 - Sciences et Techniques (UM2)-Centre National de la Recherche Scientifique (CNRS), Institut de Génomique Fonctionnelle - Montpellier GenomiX (IGF MGX), Université de Montpellier (UM)-Université Montpellier 1 (UM1)-Institut National de la Santé et de la Recherche Médicale (INSERM)-Université Montpellier 2 - Sciences et Techniques (UM2)-Centre National de la Recherche Scientifique (CNRS)-Université de Montpellier (UM)-Université Montpellier 1 (UM1)-Institut National de la Santé et de la Recherche Médicale (INSERM)-Université Montpellier 2 - Sciences et Techniques (UM2)-Centre National de la Recherche Scientifique (CNRS)
Jazyk: angličtina
Rok vydání: 2021
Předmět:
AcademicSubjects/SCI01140
AcademicSubjects/SCI01060
Computer science
[SDV]Life Sciences [q-bio]
AcademicSubjects/SCI00030
Machine learning
computer.software_genre
AcademicSubjects/SCI01180
Set (abstract data type)
03 medical and health sciences
0302 clinical medicine
Lasso (statistics)
[SDV.BBM.GTP]Life Sciences [q-bio]/Biochemistry
Molecular Biology/Genomics [q-bio.GN]

Genetic algorithm
Gene
030304 developmental biology
0303 health sciences
Fitness function
[SDV.BIBS] Life Sciences [q-bio]/Quantitative Methods [q-bio.QM]
business.industry
Gene Annotation
Expression (computer science)
Linear discriminant analysis
[SDV.BIBS]Life Sciences [q-bio]/Quantitative Methods [q-bio.QM]
Methart
ComputingMethodologies_PATTERNRECOGNITION
[SDV.BBM.GTP] Life Sciences [q-bio]/Biochemistry
Molecular Biology/Genomics [q-bio.GN]

Artificial intelligence
AcademicSubjects/SCI00980
ComputingMethodologies_GENERAL
business
computer
030217 neurology & neurosurgery
Zdroj: NAR Genomics and Bioinformatics
NAR Genomics and Bioinformatics, 2021, 3 (4), pp.lqab103. ⟨10.1093/nargab/lqab103⟩
NAR Genomics and Bioinformatics, Oxford University Press, 2021, 3 (4), pp.lqab103. ⟨10.1093/nargab/lqab103⟩
ISSN: 2631-9268
Popis: A comprehensive, accurate functional annotation of genes is key to systems-level approaches. As functionally related genes tend to be co-expressed, one possible approach to identify functional modules or supplement existing gene annotations is to analyse gene co-expression. We describe TopoFun, a machine learning method that combines topological and functional information to improve the functional similarity of gene co-expression modules. Using LASSO, we selected topological descriptors that discriminated modules made of functionally related genes and random modules. Using the selected topological descriptors, we performed linear discriminant analysis to construct a topological score that predicted the type of a module, random-like or functional-like. We combined the topological score with a functional similarity score in a fitness function that we used in a genetic algorithm to explore the co-expression network. To illustrate the use of TopoFun, we started from a subset of the Gene Ontology Biological Processes (GO-BPs) and showed that TopoFun efficiently retrieved genes that we omitted, and aggregated a number of novel genes to the initial GO-BP while improving module topology and functional similarity. Using an independent protein-protein interaction database, we confirmed that the novel genes gathered by TopoFun were functionally related to the original gene set.
Databáze: OpenAIRE