ML-Augmented Automation for Recovering Links Between Pull-Requests and Issues on GitHub

Autor: Zakarea Alshara, Hamzeh Eyal Salman, Anas Shatnawi, Abdelhak-Djamel Seriai
Jazyk: angličtina
Rok vydání: 2023
Předmět:
Zdroj: IEEE Access, Vol 11, Pp 5596-5608 (2023)
Druh dokumentu: article
ISSN: 2169-3536
DOI: 10.1109/ACCESS.2023.3236392
Popis: GitHub provides a distributed and collaborative platform to develop and maintain open-source projects. This social coding platform achieves this collaborative development, with or without coordination, using pull requests and issues artefacts. When the number of daily submitted issues rapidly grows up, especially in popular repositories, managing issues becomes more complicated. To help the repository’s developers in issues processing, there are external contributors who fix issues by submitting pull-requests. On GitHub, a pull-request is frequently linked with a submitted issue to show that a solution is in progress. Unfortunately, contributors might be lazy or forget to link the Pull-Requests with their corresponding Issues. Only a very small share of these links are established, whereas a large portion of links is missed in the development history. In spite of that, even for senior developers, manually recovering the links between Pull-Request and Issues from evolutionary development history is a time-consuming, challenging, and error-prone task. In this article, we propose to build ML models to recover links between pull-requests and their issues using two Machine Learning algorithms (KMeans and BIRCH) based on lexical and semantic weighting measurements. These models are evaluated using PI-Link ground-truth dataset. The obtained results show that pull-request and issue links can be recovered with an accuracy of 91.5% using BIRCH clustering algorithm.
Databáze: Directory of Open Access Journals