RFcaller: a machine learning approach combined with read-level features to detect somatic mutations

Autor: Ander Díaz-Navarro, Pablo Bousquets-Muñoz, Ferran Nadeu, Sara López-Tamargo, Silvia Beà, Elias Campo, Xose S. Puente
Rok vydání: 2022
Popis: MotivationThe cost reduction in sequencing and the extensive genomic characterization of a wide variety of cancers is expanding the use of tumor sequencing approaches to a wide number of research groups and to the clinical practice. Although specific pipelines have been generated for the identification of somatic mutations, their results usually differ considerably, and a common approach in many projects is to use several callers to achieve a more reliable set of mutations. This procedure is computationally very expensive and time-consuming, and it suffers from the same limitations in sensitivity and specificity as other approaches. Expert revision of mutant calls is therefore required to verify calls that might be used for clinical diagnosis. Machine learning techniques provide a useful approach to incorporate expert-reviewed information for the identification of somatic mutations.ResultsWe have developed RFcaller, a pipeline based on machine learning algorithms, for the detection of somatic mutations in tumor-normal paired samples. RFcaller shows high accuracy for the detection of substitutions and indels from whole genome or exome data. It allows the detection of mutations in driver genes missed by other approaches, and has been validated by comparison to deep sequencing and Sanger sequencing. The pipeline is able to analyze a whole genome in a small period of time, and with a small computational footprint.Availability and implementationRFcaller is available at GitHub repository (https://github.com/xa-lab/RFcaller) and DockerHub (https://hub.docker.com/repository/docker/labxa/rfcaller).Contactxspuente@uniovi.esSupplementary informationSupplementary data is available online.
Databáze: OpenAIRE