Detecting Rare Cell Populations in Flow Cytometry Data Using UMAP
Autor: | Markus Diem, Lisa Weijler, Michael Reiter, Margarita Maurer-Granofszky |
---|---|
Rok vydání: | 2021 |
Předmět: |
0301 basic medicine
education.field_of_study business.industry Dimensionality reduction Population Pattern recognition Mixture model Random forest 03 medical and health sciences 030104 developmental biology 0302 clinical medicine 030220 oncology & carcinogenesis Classifier (linguistics) Unsupervised learning Artificial intelligence Cluster analysis business Projection (set theory) education |
Zdroj: | ICPR |
Popis: | We present an approach to detecting small cell populations in flow cytometry (FCM) samples based on the combination of unsupervised manifold embedding and supervised random forest classification. Each sample consists of hundred thousands to a million cells where each cell typically corresponds to a measurement vector with 10 to 50 dimensions. The difficulty of the task is that clusters of measurement vectors formed in the data space according to standard clustering criteria often do not correspond to biologically meaningful sub-populations of cells, due to strong variations in shape and size of their distributions. In many cases the relevant population consists of less than 100 scattered events out of millions of events, where supervised approaches perform better than unsupervised clustering. The aim of this paper is to demonstrate that the performance of the standard supervised classifier can be improved significantly by combining it with a preceding unsupervised learning step involving the Uniform Manifold Approximation and Projection (UMAP). We present an experimental evaluation on FCM data from children suffering from Acute Lymphoblastic Leukemia (ALL) showing that the improvement particularly occurs in difficult samples where the size of the relevant population of leukemic cells is low in relation to other sub-populations. We show that the positive effect of the UMAP becomes more noticeable for smaller training sets. Further, the experiments indicate that in this situation the algorithm also outperforms other baseline methods based on Gaussian Mixture Models. |
Databáze: | OpenAIRE |
Externí odkaz: |