DeepCNV: a deep learning approach for authenticating copy number variations

Autor:	Cheng Zhong, Fabian Brand, Xiurui Hou, Peter Krawitz, Patrick M. A. Sleiman, Zhi Wei, Joseph T. Glessner, Jie Zhang, Hakon Hakonarson, Munir Khan
Rok vydání:	2021
Předmět:	DNA Copy Number Variations Computer science Datasets as Topic Machine learning computer.software_genre 03 medical and health sciences Deep Learning 0302 clinical medicine False positive paradox Humans Disease False Positive Reactions Copy-number variation Molecular Biology 030304 developmental biology 0303 health sciences Artificial neural network Receiver operating characteristic Genome Human business.industry Deep learning Replicate Experimental validation Benchmarking ROC Curve Area Under Curve Problem Solving Protocol Artificial intelligence False positive rate business computer 030217 neurology & neurosurgery Information Systems
Zdroj:	Brief Bioinform
ISSN:	1477-4054 1467-5463
Popis:	Copy number variations (CNVs) are an important class of variations contributing to the pathogenesis of many disease phenotypes. Detecting CNVs from genomic data remains difficult, and the most currently applied methods suffer from an unacceptably high false positive rate. A common practice is to have human experts manually review original CNV calls for filtering false positives before further downstream analysis or experimental validation. Here, we propose DeepCNV, a deep learning-based tool, intended to replace human experts when validating CNV calls, focusing on the calls made by one of the most accurate CNV callers, PennCNV. The sophistication of the deep neural network algorithm is enriched with over 10 000 expert-scored samples that are split into training and testing sets. Variant confidence, especially for CNVs, is a main roadblock impeding the progress of linking CNVs with the disease. We show that DeepCNV adds to the confidence of the CNV calls with an optimal area under the receiver operating characteristic curve of 0.909, exceeding other machine learning methods. The superiority of DeepCNV was also benchmarked and confirmed using an experimental wet-lab validation dataset. We conclude that the improvement obtained by DeepCNV results in significantly fewer false positive results and failures to replicate the CNV association results.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::7c3b21cddc2fe7d121c109545e58e492 https://doi.org/10.1093/bib/bbaa381 Zobrazit plný text záznamu