Revealing the Unseen: AI Chain on LLMs for Predicting Implicit Dataflows to Generate Dataflow Graphs in Dynamically Typed Code.

Autor: Huang, Qing, Luo, Zhiwen, Xing, Zhenchang, Zeng, Jinshan, Chen, Jieshan, Xu, Xiwei, Chen, Yong
Předmět:
Zdroj: ACM Transactions on Software Engineering & Methodology; Sep2024, Vol. 33 Issue 7, p1-29, 29p
Abstrakt: Dataflow graphs (DFGs) capture definitions (defs) and uses across program blocks, which is a fundamental program representation for program analysis, testing and maintenance. However, dynamically typed programming languages like Python present implicit dataflow issues that make it challenging to determine def-use flow information at compile time. Static analysis methods like Soot and WALA are inadequate for handling these issues, and manually enumerating comprehensive heuristic rules is impractical. Large pre-trained language models (LLMs) offer a potential solution, as they have powerful language understanding and pattern matching abilities, allowing them to predict implicit dataflow by analyzing code context and relationships between variables, functions, and statements in code. We propose leveraging LLMs' in-context learning ability to learn implicit rules and patterns from code representation and contextual information to solve implicit dataflow problems. To further enhance the accuracy of LLMs, we design a five-step chain of thought (CoT) and break it down into an Artificial Intelligence (AI) chain, with each step corresponding to a separate AI unit to generate accurate DFGs for Python code. Our approach's performance is thoroughly assessed, demonstrating the effectiveness of each AI unit in the AI Chain. Compared to static analysis, our method achieves 82% higher def coverage and 58% higher use coverage in DFG generation on implicit dataflow. We also prove the indispensability of each unit in the AI Chain. Overall, our approach offers a promising direction for building software engineering tools by utilizing foundation models, eliminating significant engineering and maintenance effort, but focusing on identifying problems for AI to solve. [ABSTRACT FROM AUTHOR]
Databáze: Complementary Index