Ensemble Adaptive Total Variation Graph Regularized NMF for Singlecell RNA-seq Data Analysis

Autor:	Tahir Ansari Mohammad, B. Kirmizibayrak Petek, El-Baz Ahmed, Shaldam Moataz, Mansour Basem, Yahya Galal, K. Kaymakcioglu Bedia, Tudu Praveen, Keuper Kristina, Gopalakrishnan Nair Gopakumar, Madhavan Manu, Bhuvanendran Saatheeyavaane, Kong Xiang-Zhen, Liu Jin-Xing, Othman Iekhsan, Zhu Ya-Li, Gao Ying-Lian, Sriramulu Sushmitha, Tok Fatih, Chaudhuri Punarbasu, Ning Chen Win, Pathak Surajit, Mahanty Shouvik, A. Sahar Esra, Farooq Shaikh Mohd., Zhu Rong, Kutty Radhakrishnan Ammu, Date Abhijit, Yilmaz Sinem
Rok vydání:	2021
Předmět:	Computational Mathematics Variation (linguistics) Genetics Graph (abstract data type) RNA-Seq Computational biology Molecular Biology Biochemistry Mathematics Non-negative matrix factorization
Zdroj:	Current Bioinformatics. 16:1014-1023
ISSN:	1574-8936
Popis:	Background: Single-cell RNA sequencing techniques have emerged as effective approaches for finding the heterogeneity between cells and discovering the differentiation stage. Adaptive total variation graph regularized nonnegative matrix factorization (ATV-NMF) has been proposed to capture the inner geometric structure and determine whether to retain feature details or denoise, which is suitable for analyzing single-cell data. However, the rank of matrix factorization significantly affects clustering performance greatly, and it is still challenging to determine the optimal rank. Objective: To solve the problem, in this paper, we propose an ensemble clustering method ANMFCE to integrate several base clustering results corresponding to different parameter rank values. Methods: Firstly, we use the ATV-NMF algorithm to obtain clustering results with different dimension reduction ranks. Secondly, the consensus function based on connected-triple-based similarity is applied to obtain the similarity matrix. Finally, the spectral clustering method is used to find the final optimal partition. Results: Clustering results on six single-cell sequencing datasets show that our method is more advanced than the individual ATV-NMF method and other comparison methods, which can illustrate that our method is effective in finding the heterogeneity in single-cell datasets. Moreover, the identification of gene markers also achieves accurate results. Conclusion: In summary, our method is effective for analyzing single-cell RNA sequencing datasets.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::d565a059c04cf93e99bb60fb5a3bae99 https://doi.org/10.2174/1574893616666210528164302 Zobrazit plný text záznamu