Scalable in-memory processing of omics workflows

Autor: Vadim Elisseev, Laura-Jayne Gardiner, Ritesh Krishna
Jazyk: angličtina
Rok vydání: 2022
Předmět:
Zdroj: Computational and Structural Biotechnology Journal, Vol 20, Iss , Pp 1914-1924 (2022)
Druh dokumentu: article
ISSN: 2001-0370
DOI: 10.1016/j.csbj.2022.04.014
Popis: We present a proof of concept implementation of the in-memory computing paradigm that we use to facilitate the analysis of metagenomic sequencing reads. In doing so we compare the performance of POSIX™file systems and key-value storage for omics data, and we show the potential for integrating high-performance computing (HPC) and cloud native technologies. We show that in-memory key-value storage offers possibilities for improved handling of omics data through more flexible and faster data processing. We envision fully containerized workflows and their deployment in portable micro-pipelines with multiple instances working concurrently with the same distributed in-memory storage. To highlight the potential usage of this technology for event driven and real-time data processing, we use a biological case study focused on the growing threat of antimicrobial resistance (AMR). We develop a workflow encompassing bioinformatics and explainable machine learning (ML) to predict life expectancy of a population based on the microbiome of its sewage while providing a description of AMR contribution to the prediction. We propose that in future, performing such analyses in ’real-time’ would allow us to assess the potential risk to the population based on changes in the AMR profile of the community.
Databáze: Directory of Open Access Journals