PROSurvival: A Technical Case Report on Creating and Publishing a Dataset for Federated Learning on Survival Prediction of Prostate Cancer Patients.

Autor: Xu T; R&D Division Health, OFFIS - Institute for Information Technology, Germany., Wolters T; R&D Division Health, OFFIS - Institute for Information Technology, Germany., Lotz J; Fraunhofer Institute for Digital Medicine MEVIS, Germany., Bisson T; Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Institut für Pathologie, Berlin, Germany., Kiehl TR; Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Institut für Pathologie, Berlin, Germany., Flinner N; Goethe University Frankfurt, Universitätsklinikum, Dr. Senckenbergisches Institut für Pathologie, Frankfurt am Main, Germany., Zerbe N; Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Institut für Pathologie, Berlin, Germany.; Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Institut für Medizinische Informatik, Berlin, Germany., Eichelberg M; R&D Division Health, OFFIS - Institute for Information Technology, Germany.
Jazyk: angličtina
Zdroj: Studies in health technology and informatics [Stud Health Technol Inform] 2024 Nov 22; Vol. 321, pp. 220-224.
DOI: 10.3233/SHTI241096
Abstrakt: The PROSurvival project aims to improve the prediction of recurrence-free survival in prostate cancer by applying federated machine learning to whole slide images combined with selected clinical data. Both the image and clinical data will be aggregated into an anonymized dataset compliant with the General Data Protection Regulation and published under the principles of findable, accessible, interoperable, and reusable data. The DICOM standard will be used for the image data. For the accompanying clinical data, a human-readable, compact and flexible standard is yet to be defined. From the set of existing standards, mostly extendable with varying degrees of modifications, we chose oBDS as a starting point and modified it to include missing data points and to remove mandatory items not applicable to our dataset. Clinical and survival data from clinic-specific spreadsheets were converted into this modified standard, ensuring on-site data privacy during processing. For publication of the dataset, both image and clinical data are anonymized using established methods. The key challenges arose during the clinical data anonymization and in identifying research repositories meeting all of our requirements. Each clinic had to coordinate the publication with their responsible data protection officers, requiring different approval processes due to the individual states' differing interpretations of the legal regulations. The newly established German Health Data Utilization Act is expected to simplify future data sharing in a responsible and powerful way.
Databáze: MEDLINE