An External Replication on the Effects of Test-driven Development Using a Multi-site Blind Analysis Approach
Autor: | Davide Fucci, Markku Oivo, Burak Turhan, Giuseppe Scanniello, Boyce Sigweni, Natalia Juristo, Fernando Uyaguari Uyaguari, Martin Shepperd, Simone Romano |
---|---|
Jazyk: | angličtina |
Rok vydání: | 2016 |
Předmět: |
Operations research
Computer science media_common.quotation_subject Context (language use) 02 engineering and technology Machine learning computer.software_genre law.invention External experiment replication Randomized controlled trial law Replication (statistics) 0202 electrical engineering electronic engineering information engineering Quality (business) blind analysis external experiment replication test-driven development Computer Science Applications1707 Computer Vision and Pattern Recognition Software Baseline (configuration management) media_common business.industry 020207 software engineering Test-driven development Software quality 020201 artificial intelligence & image processing Artificial intelligence business computer Agile software development |
Zdroj: | ESEM |
Popis: | Context: Test-driven development (TDD) is an agile practice claimed to improve the quality of a software product, as well as the productivity of its developers. A previous study (i.e., baseline experiment) at the University of Oulu (Finland) compared TDD to a test-last development (TLD) approach through a randomized controlled trial. The results failed to support the claims. Goal: We want to validate the original study results by replicating it at the University of Basilicata (Italy), using a different design. Method: We replicated the baseline experiment, using a crossover design, with 21 graduate students. We kept the settings and context as close as possible to the baseline experiment. In order to limit researchers bias, we involved two other sites (UPM, Spain, and Brunel, UK) to conduct blind analysis of the data. Results: The Kruskal-Wallis tests did not show any significant difference between TDD and TLD in terms of testing effort (p-value = .27), external code quality (p-value = .82), and developers' productivity (p-value = .83). Nevertheless, our data revealed a difference based on the order in which TDD and TLD were applied, though no carry over effect. Conclusions: We verify the baseline study results, yet our results raises concerns regarding the selection of experimental objects, particularly with respect to their interaction with the order in which of treatments are applied. We recommend future studies to survey the tasks used in experiments evaluating TDD. Finally, to lower the cost of replication studies and reduce researchers' bias, we encourage other research groups to adopt similar multi-site blind analysis approach described in this paper. This research is supported in part by the Academy of Finland Project 278354. |
Databáze: | OpenAIRE |
Externí odkaz: |