Transcriptomics and epigenetic data integration learning module on Google Cloud.

Autor: Ruprecht NA; Department of Biomedical Engineering, University of North Dakota, 501 N. Columbia Road Stop 8380, Grand Forks, ND 58202, United States., Kennedy JD; Department of Biomedical Engineering, University of North Dakota, 501 N. Columbia Road Stop 8380, Grand Forks, ND 58202, United States.; Department of Chemistry and Physics, Drury University, 900 N. Benton Avenue, Springfield, MO 65802, United States., Bansal B; Department of Biomedical Engineering, University of North Dakota, 501 N. Columbia Road Stop 8380, Grand Forks, ND 58202, United States., Singhal S; Department of Pathology, University of North Dakota, 1301 N. Columbia Road Stop 9037, Grand Forks, ND 58202, United States., Sens D; Department of Pathology, University of North Dakota, 1301 N. Columbia Road Stop 9037, Grand Forks, ND 58202, United States., Maggio A; Deloitte, Health Data and AI, Deloitte Consulting LLP, 1919 N. Lynn Street, Suite 1500, Arlington, VA 22209, United States., Doe V; Google, Google Cloud, 1900 Reston Metro Plaza, Reston, VA 20190, United States., Hawkins D; Google, Google Cloud, 1900 Reston Metro Plaza, Reston, VA 20190, United States., Campbel R; NIH Center for Information Technology (CIT), 6555 Rock Spring Drive, Bethesda, MD 20892, United States., O'Connell K; NIH Center for Information Technology (CIT), 6555 Rock Spring Drive, Bethesda, MD 20892, United States., Gill JS; Department of Biomedical Engineering, University of North Dakota, 501 N. Columbia Road Stop 8380, Grand Forks, ND 58202, United States., Schaefer K; Department of Biomedical Engineering, University of North Dakota, 501 N. Columbia Road Stop 8380, Grand Forks, ND 58202, United States., Singhal SK; Department of Biomedical Engineering, University of North Dakota, 501 N. Columbia Road Stop 8380, Grand Forks, ND 58202, United States.; Department of Pathology, University of North Dakota, 1301 N. Columbia Road Stop 9037, Grand Forks, ND 58202, United States.
Jazyk: angličtina
Zdroj: Briefings in bioinformatics [Brief Bioinform] 2024 Jul 23; Vol. 25 (Supplement_1).
DOI: 10.1093/bib/bbae352
Abstrakt: Multi-omics (genomics, transcriptomics, epigenomics, proteomics, metabolomics, etc.) research approaches are vital for understanding the hierarchical complexity of human biology and have proven to be extremely valuable in cancer research and precision medicine. Emerging scientific advances in recent years have made high-throughput genome-wide sequencing a central focus in molecular research by allowing for the collective analysis of various kinds of molecular biological data from different types of specimens in a single tissue or even at the level of a single cell. Additionally, with the help of improved computational resources and data mining, researchers are able to integrate data from different multi-omics regimes to identify new prognostic, diagnostic, or predictive biomarkers, uncover novel therapeutic targets, and develop more personalized treatment protocols for patients. For the research community to parse the scientifically and clinically meaningful information out of all the biological data being generated each day more efficiently with less wasted resources, being familiar with and comfortable using advanced analytical tools, such as Google Cloud Platform becomes imperative. This project is an interdisciplinary, cross-organizational effort to provide a guided learning module for integrating transcriptomics and epigenetics data analysis protocols into a comprehensive analysis pipeline for users to implement in their own work, utilizing the cloud computing infrastructure on Google Cloud. The learning module consists of three submodules that guide the user through tutorial examples that illustrate the analysis of RNA-sequence and Reduced-Representation Bisulfite Sequencing data. The examples are in the form of breast cancer case studies, and the data sets were procured from the public repository Gene Expression Omnibus. The first submodule is devoted to transcriptomics analysis with the RNA sequencing data, the second submodule focuses on epigenetics analysis using the DNA methylation data, and the third submodule integrates the two methods for a deeper biological understanding. The modules begin with data collection and preprocessing, with further downstream analysis performed in a Vertex AI Jupyter notebook instance with an R kernel. Analysis results are returned to Google Cloud buckets for storage and visualization, removing the computational strain from local resources. The final product is a start-to-finish tutorial for the researchers with limited experience in multi-omics to integrate transcriptomics and epigenetics data analysis into a comprehensive pipeline to perform their own biological research.This manuscript describes the development of a resource module that is part of a learning platform named ``NIGMS Sandbox for Cloud-based Learning'' https://github.com/NIGMS/NIGMS-Sandbox. The overall genesis of the Sandbox is described in the editorial NIGMS Sandbox [16] at the beginning of this Supplement. This module delivers learning materials on the analysis of bulk and single-cell ATAC-seq data in an interactive format that uses appropriate cloud resources for data access and analyses.
(© The Author(s) 2024. Published by Oxford University Press.)
Databáze: MEDLINE
Nepřihlášeným uživatelům se plný text nezobrazuje