Popis: |
Research in modern healthcare requires vast volumes of data from various healthcare centers across the globe. It is not always feasible to centralize clinical data without compromising privacy. A tool addressing these issues and facilitating reuse of clinical data is the need of the hour. The Federated Learning approach, governed in a set of agreements such as the Personal Health Train (PHT) manages to tackle these concerns by distributing models to the data centers instead of the traditional approach of centralizing datasets. One of the pre-requisites of PHT is using semantically interoperable datasets for the models to be able to find them. FAIR (Findable, Accessible, Interoperable, Reusable) principles help in building interoperable and reusable data by adding knowledge representation and providing descriptive metadata. However, the process of making data FAIR is not easy and straightforward. Our main objective is to disentangle this process by using domain and technical expertise and get data prepared for federated learning. This paper introduces applications that are easily deployable as Docker containers, which will automate parts of the aforementioned process and significantly simplify the task of creating FAIR clinical data. Our method by-passes the need for clinical researchers to have a high degree of technical skills. We demonstrate the FAIR-ification process by applying it to five Head and Neck cancer datasets (four public and one private). The PHT paradigm is explored by building a distributed visualization dashboard from the aggregated summaries of the FAIR-ified datasets. Using the PHT infrastructure for exchanging only statistical summaries or model coefficients allows researchers to explore data from multiple centers without breaching privacy. |