Popis: |
The t-distributed Stochastic Neighbour Embedding (t-SNE) method has emerged as one of the leading methods for visualising High Dimensional (HD) data in a wide variety of fields, especially for revealing cluster structure in HD single cell transcriptomics data. However, several shortcomings of the algorithm have been identified. Specifically, t-SNE is often unable to correctly represent hierarchical relationships between clusters and spurious patterns may arise in the embedding due to incorrect parameter settings, which could lead to misinterpretations of the data. Here we incorporate t-SNE with shape-aware graph distances, a method termed shape-aware stochastic neighbour embedding (SASNE), to mitigate these limitations of the t-SNE. The merits of the SASNE are first demonstrated using synthetic data sets, where we see a significant improvement in embedding imbalanced and nonlinear clusters, as well as preservation of hierarchical structure, based on quantitative validation in clustering and dimensionality reductions. Moreover, we propose a data-driven parameter setting which we find consistently optimal in all test cases. Lastly, we demonstrate the superior performance of SASNE in embedding the MNIST image data and the single cell transcriptomics gene expression data. |