Popis: |
Single-cell Hi-C techniques make it possible to study cell-to-cell variability in genomic features. However, single-cell Hi-C (scHi-C) data are suffering from sparsity, which brings difficulties to downstream analysis such as clustering and structural analysis. The observed zeros in scHi-C data are a mixture of two types of events: structural zeros (SZ) due to underlying properties and dropouts (DO) due to low sequencing depth. Although a great deal of progress has been made in imputing dropout events for single-cell RNA-sequencing (RNA-seq) data, little has been done in identifying structural zeros and imputing dropouts for scHi-C data. To fill this gap, we propose two models that not only enhance the scHi-C data but also identify structural zeros from the observed zeros.The first model, HiCImpute, enhances scHi-C data through a Bayesian hierarchy model. It tells apart SZ from DO by defining an indicator variable. Different from the literature that treats every single cell separately, it takes spatial dependencies of scHi-C 2D data structure into account while also borrowing information from similar single cells and bulk data.The second model, scHiCSRS, enhances scHi-C data through a self-representation smoothing model. It also takes spatial dependencies of scHi-C 2D data structure and similar single cells into consideration. To identify SZ and DO with less sensitivity to sequencing depth, a Gaussian-mixture model is further developed that can estimate the probability of a pair being an SZ.Through an extensive set of simulations and real data analysis, we demonstrate the ability of HiCImpute and scHiSRS for identifying structural zeros with high sensitivity and for accurate imputation of dropout values in sampling zeros. Downstream analyses using data improved from HiCImpute and scHiCSRS yielded much more accurate clustering of cell types compared to using observed data or data improved by several comparison methods. Most significantly, HiCImpute-improved data has led to the identification of subtypes within each of the excitatory neuronal cells of L4 and L5 in the prefrontal cortex. |