BugSwarm: Mining and Continuously Growing a Dataset of Reproducible Failures and Fixes
Autor: | Tomassi, David A., Dmeiri, Naji, Wang, Yichen, Bhowmick, Antara, Liu, Yen-Chuan, Devanbu, Premkumar, Vasilescu, Bogdan, Rubio-González, Cindy |
---|---|
Rok vydání: | 2019 |
Předmět: | |
Druh dokumentu: | Working Paper |
Popis: | Fault-detection, localization, and repair methods are vital to software quality; but it is difficult to evaluate their generality, applicability, and current effectiveness. Large, diverse, realistic datasets of durably-reproducible faults and fixes are vital to good experimental evaluation of approaches to software quality, but they are difficult and expensive to assemble and keep current. Modern continuous-integration (CI) approaches, like Travis-CI, which are widely used, fully configurable, and executed within custom-built containers, promise a path toward much larger defect datasets. If we can identify and archive failing and subsequent passing runs, the containers will provide a substantial assurance of durable future reproducibility of build and test. Several obstacles, however, must be overcome to make this a practical reality. We describe BugSwarm, a toolset that navigates these obstacles to enable the creation of a scalable, diverse, realistic, continuously growing set of durably reproducible failing and passing versions of real-world, open-source systems. The BugSwarm toolkit has already gathered 3,091 fail-pass pairs, in Java and Python, all packaged within fully reproducible containers. Furthermore, the toolkit can be run periodically to detect fail-pass activities, thus growing the dataset continually. Comment: In Proceedings of the 41st ACM/IEEE International Conference on Software Engineering (ICSE'19) |
Databáze: | arXiv |
Externí odkaz: |