Unsupervised ensemble learning for genome sequencing

Autor: Alba Pagès-Zamora, Idoia Ochoa, Gonzalo Ruiz Cavero, Pol Villalvilla-Ornat
Přispěvatelé: Universitat Politècnica de Catalunya. Departament de Teoria del Senyal i Comunicacions, Universitat Politècnica de Catalunya. SPCOM - Grup de Recerca de Processament del Senyal i Comunicacions
Jazyk: angličtina
Rok vydání: 2022
Předmět:
Zdroj: UPCommons. Portal del coneixement obert de la UPC
Universitat Politècnica de Catalunya (UPC)
Pattern Recognition, 129
ISSN: 0031-3203
1873-5142
Popis: Unsupervised ensemble learning refers to methods devised for a particular task that combine data provided by decision learners taking into account their reliability, which is usually inferred from the data. Here, the variant calling step of the next generation sequencing technologies is formulated as an unsupervised ensemble classification problem. A variant calling algorithm based on the expectation-maximization algorithm is further proposed that estimates the maximum-a-posteriori decision among a number of classes larger than the number of different labels provided by the learners. Experimental results with real human DNA sequencing data show that the proposed algorithm is competitive compared to state-of-the-art variant callers as GATK, HTSLIB, and Platypus.
Pattern Recognition, 129
ISSN:0031-3203
ISSN:1873-5142
Databáze: OpenAIRE