Towards Developing a Repository of Logical Errors Observed in Parallel Code for Teaching Code Correctness

Autor:	Trung Nguyen Ba, Ritu Arora
Rok vydání:	2018
Předmět:	020203 distributed computing Correctness Computer science Programming language media_common.quotation_subject 020207 software engineering 02 engineering and technology computer.software_genre Task (project management) CUDA Documentation Debugging Software bug 0202 electrical engineering electronic engineering information engineering Code (cryptography) computer media_common
Zdroj:	EduHPC@SC
DOI:	10.1109/eduhpc.2018.00011
Popis:	Debugging parallel programs can be a challenging task, especially for the beginners. While the debuggers like DDT and TotalView can be extremely useful in tracking down the program statements that are connected to the bugs, often the onus is on the programmers to reason about the logic of the program statements in order to fix the bugs in them. These debuggers may neither be able to precisely indicate the logical errors in the parallel programs nor they may provide information on fixing those errors. Therefore, there is a need for developing tools and educational content on teaching the pitfalls in parallel programming and writing correct code. Such content can be useful to guide the beginners in avoiding commonly observed logical errors and in verifying the correctness of their parallel programs. In this paper, we 1) enumerate some of the logical errors that we have seen in the parallel programs (OpenMP, MPI, and CUDA) that were written by the beginners working with us, and 2) discuss the ways to fix those errors. The errors are mainly related to the data distribution, exiting distributed for-loops, and workload-imbalance. The documentation on these logical errors can contribute in enhancing the productivity of the beginners, and can potentially help them in their debugging efforts. We have added the code samples containing logical errors and their solutions in a Github repository so that the others in the community can reproduce the errors on their systems and learn from them. The content presented in this paper may also be useful for those developing high-level tools for detecting and removing logical errors in parallel programs.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::040f73f3c54094964061b5f4c6da0378 https://doi.org/10.1109/eduhpc.2018.00011 Zobrazit plný text záznamu