An analysis of classical multidimensional scaling with applications to clustering.
Autor: | Little A; Department of Mathematics, Utah Center for Data Science, University of Utah, Salt Lake City, UT 84112, USA., Xie Y; Department of Computational Mathematics, Science and Engineering, Department of Statistics, Michigan State University, East Lansing, MI 48824, USA., Sun Q; Department of Statistical Sciences, University of Toronto, Toronto, ON M5G 1Z5, Canada. |
---|---|
Jazyk: | angličtina |
Zdroj: | Information and inference : a journal of the IMA [Inf inference] 2022 Apr 23; Vol. 12 (1), pp. 72-112. Date of Electronic Publication: 2022 Apr 23 (Print Publication: 2023). |
DOI: | 10.1093/imaiai/iaac004 |
Abstrakt: | Classical multidimensional scaling is a widely used dimension reduction technique. Yet few theoretical results characterizing its statistical performance exist. This paper provides a theoretical framework for analyzing the quality of embedded samples produced by classical multidimensional scaling. This lays a foundation for various downstream statistical analyses, and we focus on clustering noisy data. Our results provide scaling conditions on the signal-to-noise ratio under which classical multidimensional scaling followed by a distance-based clustering algorithm can recover the cluster labels of all samples. Simulation studies confirm these scaling conditions are sharp. Applications to the cancer gene-expression data, the single-cell RNA sequencing data and the natural language data lend strong support to the methodology and theory. (© The Author(s) 2022. Published by Oxford University Press on behalf of the Institute of Mathematics and its Applications. All rights reserved.) |
Databáze: | MEDLINE |
Externí odkaz: | |
Nepřihlášeným uživatelům se plný text nezobrazuje | K zobrazení výsledku je třeba se přihlásit. |