Deep Learning Architecture Optimization with Metaheuristic Algorithms for Predicting BRCA1/BRCA2 Pathogenicity NGS Analysis

Autor: Eric Pellegrino, Theo Brunet, Christel Pissier, Clara Camilla, Norman Abbou, Nathalie Beaufils, Isabelle Nanni-Metellus, Philippe Métellus, L’Houcine Ouafik
Přispěvatelé: Service d'OncoBiologie, Assistance Publique - Hôpitaux de Marseille (APHM)- Hôpital Nord [CHU - APHM], Hôpital Nord [CHU - APHM], Institut de neurophysiopathologie (INP), Aix Marseille Université (AMU)-Centre National de la Recherche Scientifique (CNRS), Hôpital Privé Clairval [Marseille]
Rok vydání: 2022
Předmět:
Zdroj: BioMedInformatics
BioMedInformatics, 2022, 2 (2), pp.244-267. ⟨10.3390/biomedinformatics2020016⟩
BioMedInformatics; Volume 2; Issue 2; Pages: 244-267
ISSN: 2673-7426
Popis: BRCA1 and BRCA2 are genes with tumor suppressor activity, and they are involved ina considerable number of biological processes allowing the regulation of the cellreplication cycle. A mutation in one of these two genes has a significant probability ofcausing cancer. We have set up within the platform a machine learning algorithm basedon the random forest to predict pathogenicity in colorectal, melanoma, lung, and gliomacancer. but this algorithm has revealed its limits when we want to predict on morecomplex genes like BRCA1 and BRCA2. To help the biologist in the classification oftumors, we decided to develop a deep learning algorithm.The question we ask ourselves when we want to construct a neural network is howmany hidden layers and neurons should we use. If the number of inputs and outputs isdefined by the problem that we require to resolve, the number of hidden layers andneurons is difficult to define because there is no pre-established rule. The number ofhidden layers and neurons that make up each layer of the neural network has aninfluence on the performance of system predictions. There are different methods forfinding the optimal architecture like grid search or based on empirical equations. Allthese techniques can be very time-consuming. In this paper, we will present the twopackages that we have developed, the genetic algorithm (GA) and the particle swarmoptimization (PSO) to optimize the parameters of the neural network for the predictionof the pathogenicity of the BRCA1 and BRCA2 genes. We will compare the resultsobtained by the two algorithms. We used datasets collected from our NGS analysis ofBRCA1 and BRCA2 genes to train deep learning models. This represents a datacollection of 11,875 BRCA1 and BRCA2 variants (BRCA1 benign 2,632, BRCA1pathogenic 2,660, BRCA2 benign 3,446, BRCA2 pathogenic 3,137). Our preliminaryresults show that the PSO provided the most significant architecture in terms of hiddenlayers and the number of neurons compared to grid search and GA. The optimalarchitecture found by the PSO algorithm is composed of 6 hidden layers with 275 hiddennodes with an accuracy of 0.98, precision 0.99, recall 0.98, and a specificity of 0.99.
Databáze: OpenAIRE
Nepřihlášeným uživatelům se plný text nezobrazuje