DeepVariant-on-Spark: Small-Scale Genome Analysis Using a Cloud-Based Computing Framework

Autor:	Hou-Hsien Lin, Po-Jung Huang, Ming-Tai Chang, Jui-Huan Chang, Chi-Ching Lee, Cheng-Hsun Chiu, Yu-Xuan Li, Sid Weng, Petrus Tang, Yun-Lung Li, Wei-Hung Cheng, Chung-Tsai Su
Jazyk:	angličtina
Rok vydání:	2020
Předmět:	Article Subject Computer science Cost-Benefit Analysis Population Computer applications to medicine. Medical informatics R858-859.7 Cloud computing General Biochemistry Genetics and Molecular Biology 03 medical and health sciences Deep Learning Spark (mathematics) Resource allocation (computer) Humans education 030304 developmental biology 0303 health sciences education.field_of_study General Immunology and Microbiology Whole Genome Sequencing business.industry Genome Human Applied Mathematics Deep learning 030305 genetics & heredity Computational Biology Genetic Variation High-Throughput Nucleotide Sequencing General Medicine Genome project Cloud Computing Data science Pipeline (software) Modeling and Simulation Scalability Artificial intelligence Neural Networks Computer business Software Research Article
Zdroj:	Computational and Mathematical Methods in Medicine, Vol 2020 (2020) Computational and Mathematical Methods in Medicine
ISSN:	1748-670X
DOI:	10.1155/2020/7231205
Popis:	Although sequencing a human genome has become affordable, identifying genetic variants from whole-genome sequence data is still a hurdle for researchers without adequate computing equipment or bioinformatics support. GATK is a gold standard method for the identification of genetic variants and has been widely used in genome projects and population genetic studies for many years. This was until the Google Brain team developed a new method, DeepVariant, which utilizes deep neural networks to construct an image classification model to identify genetic variants. However, the superior accuracy of DeepVariant comes at the cost of computational intensity, largely constraining its applications. Accordingly, we present DeepVariant-on-Spark to optimize resource allocation, enable multi-GPU support, and accelerate the processing of the DeepVariant pipeline. To make DeepVariant-on-Spark more accessible to everyone, we have deployed the DeepVariant-on-Spark to the Google Cloud Platform (GCP). Users can deploy DeepVariant-on-Spark on the GCP following our instruction within 20 minutes and start to analyze at least ten whole-genome sequencing datasets using free credits provided by the GCP. DeepVaraint-on-Spark is freely available for small-scale genome analysis using a cloud-based computing framework, which is suitable for pilot testing or preliminary study, while reserving the flexibility and scalability for large-scale sequencing projects.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::136aa1df34349994dfded65e82ae1c21 Zobrazit plný text záznamu Plný text ve formátu PDF Plný text ve formátu HTML
Nepřihlášeným uživatelům se plný text nezobrazuje	K zobrazení výsledku je třeba se přihlásit.