Popis: |
The use of AI tools for code generation is increasing in popularity, and two of these tools are ChatGPT and GitHub Copilot. These tools could potentially reduce development time and costs for developers and companies, however, ensuring the correctness and quality of AI-generated code is crucial for its adoption. This study conducted a quantitative controlled experiment to evaluate the code generation capabilities of Copilot and ChatGPT in terms of code correctness and quality. The experiment aimed to address research questions regarding the performance of these AI tools. The results indicate that both ChatGPT and Copilot can generate correct code from given instructions, though there is room for improvement. ChatGPT achieved a correctness rate of 87.33%, while Copilot performed slightly better at 89%. Statistical analysis revealed no significant difference in code correctness between the two tools. Regarding code quality, ChatGPT demonstrated impressive performance, with 98.52% of generated lines free from quality rule violations. Furthermore, 80.7% of ChatGPT-generated algorithms had no quality rule violations. Copilot generated correct lines for 94.07% of total lines but only achieved 64.7% of algorithms with no quality rule violations. The statistical analysis showed a statistically significant difference in code quality between ChatGPT and Copilot, indicating that ChatGPT generally produces higher quality code. This research contributes to understanding the capabilities of AI code generation tools and highlights their potential to produce correct and high-quality code. |