dSNE: a visualization approach for use with decentralized data

Autor: Saha, Debbrata K., Calhoun, V. D., Yuhui, DU, Zening, FU, Panta, Sandeep R., Plis, S. M.
Jazyk: angličtina
Rok vydání: 2019
DOI: 10.1101/826974
Popis: Visualization of high dimensional large-scale datasets via an embedding into a 2D map is a powerful exploration tool for assessing latent structure in the data and detecting outliers. It plays a vital role in neuroimaging field because sometimes it is the only way to perform quality control of large dataset. There are many methods developed to perform this task but most of them rely on the assumption that all samples are locally available for the computation. Specifically, one needs access to all the samples in order to compute the distance directly between all pairs of points to measure the similarity. But all pairs of samples may not be available locally always from local sites for various reasons (e.g. privacy concerns for rare disease data, institutional or IRB policies). This is quite common for biomedical data, e.g. neuroimaging and genetic, where privacy-preservation is a major concern. In this scenario, a quality control tool that visualizes decentralized dataset in its entirety via global aggregation of local computations is especially important as it would allow screening of samples that cannot be evaluated otherwise. We introduced an algorithm to solve this problem: decentralized data stochastic neighbor embedding (dSNE). In our approach, data samples (i.e. brain images) located at different sites are simultaneously mapped into the same space according to their similarities. Yet, the data never leaves the individual sites and no pairwise metric is ever directly computed between any two samples not collocated. Based on the Modified National Institute of Standards and Technology database (MNIST) and the Columbia Object Image Library (COIL-20) dataset we introduce metrics for measuring the embedding quality and use them to compare dSNE to its centralized counterpart. We also apply dSNE to various multi-site neuroimaging datasets and show promising results which highlight the potential of our decentralized visualization approach.
Databáze: OpenAIRE