A Comparison of Apache Spark Supervised Machine Learning Algorithms for DNA Splicing Site Prediction
Autor: | Salvatore Rampone, Emanuel Weitschek, Valerio Morfino |
---|---|
Rok vydání: | 2019 |
Předmět: | |
Zdroj: | Neural Approaches to Dynamics of Signal Exchanges ISBN: 9789811389498 Neural Approaches to Dynamics of Signal Exchanges |
DOI: | 10.1007/978-981-13-8950-4_13 |
Popis: | Thanks to next-generation sequencing techniques, a very big amount of genomic data are available. Therefore, in the last years, biomedical databases are growing more and more. Analyzing this big amount of data with bioinformatics and big data techniques could lead to the discovery of new knowledge for the treatment of serious diseases. In this work, we deal with the splicing site prediction problem in DNA sequences by using supervised machine learning algorithms included in the MLlib library of Apache Spark, a fast and general engine for big data processing. We show the implementation details and the performance of those algorithms on two public available datasets adopting both local and cloud environments, emphasizing the importance of this last environment for its scalability and elasticity of use. We compare the performance of the algorithms with U-BRAIN, a general-purpose learning algorithm originally designed for the prediction of DNA splicing sites. Results show that, among the Spark algorithms, all have good prediction accuracy (>0.9)—that is comparable with the one of U-BRAIN—and much lower execution time. Therefore, we can state that Apache Spark machine learning algorithms are promising candidates for dealing with the DNA splicing site prediction problem. |
Databáze: | OpenAIRE |
Externí odkaz: |