SelfE: Gene Selection via Self-Expression for Single-Cell Data
Autor: | Debarka Sengupta, Angshul Majumdar, Priyadarshini Rai |
---|---|
Rok vydání: | 2022 |
Předmět: |
Feature engineering
Sequence Analysis RNA Computer science Gene Expression Profiling Applied Mathematics Feature vector Feature extraction Feature selection Computational biology Missing data Expression (mathematics) Principal component analysis Genetics Cluster Analysis Single-Cell Analysis Cluster analysis Algorithms Biotechnology |
Zdroj: | IEEE/ACM Transactions on Computational Biology and Bioinformatics. 19:624-632 |
ISSN: | 2374-0043 1545-5963 |
DOI: | 10.1109/tcbb.2020.2997326 |
Popis: | Single-cell RNA-sequencing has been proved to be advantageous in discerning molecular heterogeneity in seemingly similar cells in a tissue. Due to the paucity of starting RNA, a large fraction of transcripts fail to amplify during the polymerase chain reaction cycle. This gets compounded by trivial biological noise such as variability in the cell cycle specific genes. As a result expression matrix obtained from a single-cell study is highly sparse with a large number of missing values. This hinders the downstream analysis of single-cell expression data. It has been observed that feature engineering significantly improves the analysis outcomes. Feature extraction methods such as principal component analysis and zero-inflated factor analysis have been shown to be useful for subsequent steps for data analysis including clustering. However, too little or no visible efforts have been observed for developing feature selection techniques, which offer transparency for the analyst's consumption. We propose SelfE, a novel L2,0-minimization algorithm that determines an optimal subset of feature vectors that preserves sub-space structures as observed in the data. We compared SelfE with the commonly used feature selection methods for single-cell expression data analysis. |
Databáze: | OpenAIRE |
Externí odkaz: |