Distributed Statistical Analyses: A Scoping Review and Examples of Operational Frameworks Adapted to Health Analytics.

Autor: Camirand Lemyre F; GRIIS, Université de Sherbrooke, 2500, boul. de l'Université, Sherbrooke, CA.; Département de mathématiques, Faculté des sciences, Université de Sherbrooke, Sherbrooke, CA., Lévesque S; GRIIS, Université de Sherbrooke, 2500, boul. de l'Université, Sherbrooke, CA.; Département de mathématiques, Faculté des sciences, Université de Sherbrooke, Sherbrooke, CA.; Health Data Research Network Canada, Vancouver, CA., Domingue MP; GRIIS, Université de Sherbrooke, 2500, boul. de l'Université, Sherbrooke, CA.; Chaire MEIE Québec - Le numérique au service des systèmes de santé apprenants, Université de Sherbrooke, Sherbrooke, CA.; Département de mathématiques, Faculté des sciences, Université de Sherbrooke, Sherbrooke, CA., Herrmann K; Département de mathématiques, Faculté des sciences, Université de Sherbrooke, Sherbrooke, CA., Ethier JF; GRIIS, Université de Sherbrooke, 2500, boul. de l'Université, Sherbrooke, CA.; Département de médecine, Faculté de médecine et des sciences de la santé, Université de Sherbrooke, Sherbrooke, CA.; Health Data Research Network Canada, Vancouver, CA.
Jazyk: angličtina
Zdroj: JMIR medical informatics [JMIR Med Inform] 2024 Jul 19. Date of Electronic Publication: 2024 Jul 19.
DOI: 10.2196/53622
Abstrakt: Background: Data from multiple organizations are crucial for advancing learning health systems. However, ethical, legal, and social concerns may restrict the use of standard statistical methods that rely on pooling data. Although distributed algorithms offer alternatives, they may not always be suitable for health frameworks.
Objective: This paper aims to support researchers and data custodians in three ways: (1) providing a concise overview of the literature on statistical inference methods for horizontally partitioned data; (2) describing the methods applicable to generalized linear models (GLM) and assessing their underlying distributional assumptions; (3) adapting existing methods to make them fully usable in health settings.
Methods: A scoping review methodology was employed for the literature mapping, from which methods presenting a methodological framework for GLM analyses with horizontally partitioned data were identified and assessed from the perspective of applicability in health settings. Statistical theory was used to adapt methods and to derive the properties of the resulting estimators.
Results: From the review, 41 articles were selected, and six approaches were extracted for conducting standard GLM-based statistical analysis. However, these approaches assumed evenly and identically distributed data across nodes. Consequently, statistical procedures were derived to accommodate uneven node sample sizes and heterogeneous data distributions across nodes. Workflows and detailed algorithms were developed to highlight information-sharing requirements and operational complexity.
Conclusions: This paper contributes to the field of health analytics by providing an overview of the methods that can be used with horizontally partitioned data, by adapting these methods to the context of heterogeneous health data and by clarifying the workflows and quantities exchanged by the methods discussed. Further analysis of the confidentiality preserved by these methods is needed to fully understand the risk associated with the sharing of summary statistics.
Databáze: MEDLINE