Popis: |
A considerable corpus of research on software evolution focuses on mining changes in software repositories, but omits their pre-integration history. We present a novel method for tracking this otherwise invisible evolution of software changes on mailing lists by connecting all early revisions of changes to their final version in repositories. Since artefact modifications on mailing lists are communicated by updates to fragments (i.e., patches) only, identifying semantically similar changes is a non-trivial task that our approach solves in a language-independent way. We evaluate our method on high-profile open source software (OSS) projects like the Linux kernel, and validate its high accuracy using an elaborately created ground truth. Our approach can be used to quantify properties of OSS development processes, which is an essential requirement for using OSS in reliable or safety-critical industrial products, where certifiability and conformance to processes are crucial. The high accuracy of our technique allows, to the best of our knowledge, for the first time to quantitatively determine if an open development process effectively aligns with given formal process requirements. |