Distributing Deep Learning Hyperparameter Tuning for 3D Medical Image Segmentation

Autor:	Berral García, Josep Lluís, Aranda Llorens, Oriol, Domínguez Bermúdez, Juan Luis, Torres Viñals, Jordi
Přispěvatelé:	Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Barcelona Supercomputing Center, Universitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions
Rok vydání:	2022
Předmět:	FOS: Computer and information sciences Computer Science - Machine Learning I.2.11 J.3 Computer Science - Artificial Intelligence I.4.6 GPU Scalability Parallelism Deep learning Three dimensional imaging in medicine Enginyeria de la telecomunicació::Processament del senyal::Processament de la imatge i del senyal vídeo [Àrees temàtiques de la UPC] Distributed computing Machine Learning (cs.LG) Artificial Intelligence (cs.AI) Imatgeria tridimensional en medicina Computer Science - Distributed Parallel and Cluster Computing Distributed deep learning Distributed Parallel and Cluster Computing (cs.DC) Aprenentatge profund
Zdroj:	UPCommons. Portal del coneixement obert de la UPC Universitat Politècnica de Catalunya (UPC) IPDPS 36th IEEE International Parallel & Distributed Processing Symposium, Workshop on Scalable Deep Learning (ScaDL)-IEEE Xplore
DOI:	10.1109/ipdpsw55747.2022.00172
Popis:	Most research on novel techniques for 3D Medical Image Segmentation (MIS) is currently done using Deep Learning with GPU accelerators. The principal challenge of such technique is that a single input can easily cope computing resources, and require prohibitive amounts of time to be processed. Distribution of deep learning and scalability over computing devices is an actual need for progressing on such research field. Conventional distribution of neural networks consist in data parallelism, where data is scattered over resources (e.g., GPUs) to parallelize the training of the model. However, experiment parallelism is also an option, where different training processes are parallelized across resources. While the first option is much more common on 3D image segmentation, the second provides a pipeline design with less dependence among parallelized processes, allowing overhead reduction and more potential scalability. In this work we present a design for distributed deep learning training pipelines, focusing on multi-node and multi-GPU environments, where the two different distribution approaches are deployed and benchmarked. We take as proof of concept the 3D U-Net architecture, using the MSD Brain Tumor Segmentation dataset, a state-of-art problem in medical image segmentation with high computing and space requirements. Using the BSC MareNostrum supercomputer as benchmarking environment, we use TensorFlow and Ray as neural network training and experiment distribution platforms. We evaluate the experiment speed-up, showing the potential for scaling out on GPUs and nodes. Also comparing the different parallelism techniques, showing how experiment distribution leverages better such resources through scaling. Finally, we provide the implementation of the design open to the community, and the non-trivial steps and methodology for adapting and deploying a MIS case as the here presented. 7 pages, 4 figures, scientific report, official code: https://github.com/HiEST/DistMIS
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::753af3ca2bd72edcb0a9ae97cb48593e https://doi.org/10.1109/ipdpsw55747.2022.00172 Zobrazit plný text záznamu