scATAC-seq preprocessing and imputation evaluation system for visualization, clustering and digital footprinting.

Autor: Akhtyamov P; Department of Biomedical Physics, Moscow Institute of Physics and Technology (National Research University), 9 Institutskiy per., 141701, Moscow Region, Russian Federation.; The National Medical Research Center for Endocrinology, Dm. Ulyanova, 11, 117036, Moscow, Russian Federation., Shaheen L; Department of Biomedical Physics, Moscow Institute of Physics and Technology (National Research University), 9 Institutskiy per., 141701, Moscow Region, Russian Federation.; The National Medical Research Center for Endocrinology, Dm. Ulyanova, 11, 117036, Moscow, Russian Federation., Raevskiy M; Department, École Polytechnique Fédérale de Lausanne, Rte Cantonale, 1015, Lausanne, Vaud, Switzerland., Stupnikov A; Department of Biomedical Physics, Moscow Institute of Physics and Technology (National Research University), 9 Institutskiy per., 141701, Moscow Region, Russian Federation.; The National Medical Research Center for Endocrinology, Dm. Ulyanova, 11, 117036, Moscow, Russian Federation.; Institute of Bioengineering, Research Center of Biotechnology, Russian Academy of Science, Leninsky prospect, 33, build. 2, 119071, Moscow, Russian Federation., Medvedeva YA; Department of Biomedical Physics, Moscow Institute of Physics and Technology (National Research University), 9 Institutskiy per., 141701, Moscow Region, Russian Federation.; The National Medical Research Center for Endocrinology, Dm. Ulyanova, 11, 117036, Moscow, Russian Federation.; Institute of Bioengineering, Research Center of Biotechnology, Russian Academy of Science, Leninsky prospect, 33, build. 2, 119071, Moscow, Russian Federation.
Jazyk: angličtina
Zdroj: Briefings in bioinformatics [Brief Bioinform] 2023 Nov 22; Vol. 25 (1).
DOI: 10.1093/bib/bbad447
Abstrakt: Single-cell ATAC-seq (scATAC-seq) is a recently developed approach that provides means to investigate open chromatin at single cell level, to assess epigenetic regulation and transcription factors binding landscapes. The sparsity of the scATAC-seq data calls for imputation. Similarly, preprocessing (filtering) may be required to reduce computational load due to the large number of open regions. However, optimal strategies for both imputation and preprocessing have not been yet evaluated together. We present SAPIEnS (scATAC-seq Preprocessing and Imputation Evaluation System), a benchmark for scATAC-seq imputation frameworks, a combination of state-of-the-art imputation methods with commonly used preprocessing techniques. We assess different types of scATAC-seq analysis, i.e. clustering, visualization and digital genomic footprinting, and attain optimal preprocessing-imputation strategies. We discuss the benefits of the imputation framework depending on the task and the number of the dataset features (peaks). We conclude that the preprocessing with the Boruta method is beneficial for the majority of tasks, while imputation is helpful mostly for small datasets. We also implement a SAPIEnS database with pre-computed transcription factor footprints based on imputed data with their activity scores in a specific cell type. SAPIEnS is published at: https://github.com/lab-medvedeva/SAPIEnS. SAPIEnS database is available at: https://sapiensdb.com.
(© The Author(s) 2023. Published by Oxford University Press.)
Databáze: MEDLINE
Nepřihlášeným uživatelům se plný text nezobrazuje