On the stability of canonical correlation analysis and partial least squares with application to brain-behavior associations.

Autor: Helmer M; Department of Psychiatry, Yale School of of Medicine, New Haven, CT, 06511, USA.; Manifest Technologies, New Haven, CT, 06510, USA., Warrington S; Sir Peter Mansfield Imaging Centre, Mental Health and Clinical Neurosciences, School of Medicine, University of Nottingham, Nottingham, NG7 2UH, United Kingdom., Mohammadi-Nejad AR; Sir Peter Mansfield Imaging Centre, Mental Health and Clinical Neurosciences, School of Medicine, University of Nottingham, Nottingham, NG7 2UH, United Kingdom.; National Institute for Health Research (NIHR) Nottingham Biomedical Research Ctr, Queens Medical Ctr, Nottingham, United Kingdom., Ji JL; Department of Psychiatry, Yale School of of Medicine, New Haven, CT, 06511, USA.; Manifest Technologies, New Haven, CT, 06510, USA.; Interdepartmental Neuroscience Program, Yale University School of Medicine, New Haven, CT, 06511, USA., Howell A; Department of Psychiatry, Yale School of of Medicine, New Haven, CT, 06511, USA.; Interdepartmental Neuroscience Program, Yale University School of Medicine, New Haven, CT, 06511, USA., Rosand B; Department of Physics, Yale University, New Haven, CT, 06511, USA., Anticevic A; Department of Psychiatry, Yale School of of Medicine, New Haven, CT, 06511, USA.; Manifest Technologies, New Haven, CT, 06510, USA.; Interdepartmental Neuroscience Program, Yale University School of Medicine, New Haven, CT, 06511, USA.; Department of Psychology, Yale University, New Haven, CT, 06511, USA., Sotiropoulos SN; Sir Peter Mansfield Imaging Centre, Mental Health and Clinical Neurosciences, School of Medicine, University of Nottingham, Nottingham, NG7 2UH, United Kingdom. stamatios.sotiropoulos@nottingham.ac.uk.; National Institute for Health Research (NIHR) Nottingham Biomedical Research Ctr, Queens Medical Ctr, Nottingham, United Kingdom. stamatios.sotiropoulos@nottingham.ac.uk., Murray JD; Department of Psychiatry, Yale School of of Medicine, New Haven, CT, 06511, USA. john.d.murray@dartmouth.edu.; Manifest Technologies, New Haven, CT, 06510, USA. john.d.murray@dartmouth.edu.; Department of Physics, Yale University, New Haven, CT, 06511, USA. john.d.murray@dartmouth.edu.; Department of Psychological and Brain Sciences, Dartmouth College, Hanover, NH, 03755, USA. john.d.murray@dartmouth.edu.
Jazyk: angličtina
Zdroj: Communications biology [Commun Biol] 2024 Feb 21; Vol. 7 (1), pp. 217. Date of Electronic Publication: 2024 Feb 21.
DOI: 10.1038/s42003-024-05869-4
Abstrakt: Associations between datasets can be discovered through multivariate methods like Canonical Correlation Analysis (CCA) or Partial Least Squares (PLS). A requisite property for interpretability and generalizability of CCA/PLS associations is stability of their feature patterns. However, stability of CCA/PLS in high-dimensional datasets is questionable, as found in empirical characterizations. To study these issues systematically, we developed a generative modeling framework to simulate synthetic datasets. We found that when sample size is relatively small, but comparable to typical studies, CCA/PLS associations are highly unstable and inaccurate; both in their magnitude and importantly in the feature pattern underlying the association. We confirmed these trends across two neuroimaging modalities and in independent datasets with n ≈ 1000 and n = 20,000, and found that only the latter comprised sufficient observations for stable mappings between imaging-derived and behavioral features. We further developed a power calculator to provide sample sizes required for stability and reliability of multivariate analyses. Collectively, we characterize how to limit detrimental effects of overfitting on CCA/PLS stability, and provide recommendations for future studies.
(© 2024. The Author(s).)
Databáze: MEDLINE
Nepřihlášeným uživatelům se plný text nezobrazuje