Finding and correcting syntax errors using recurrent neural networks

Autor:	Abram Hindle, Joshua Charles Campbell, Eddie Antonio Santos, José Nelson Amaral
Jazyk:	angličtina
Rok vydání:	2017
Předmět:	Artificial neural network Programming language business.industry Computer science LR parser Deep learning computer.software_genre JavaScript n-gram Recurrent neural network Language model Artificial intelligence Syntax error business computer Natural language processing computer.programming_language
DOI:	10.7287/peerj.preprints.3123
Popis:	Minor syntax errors are made by novice and experienced programmers alike; however, novice programmers lack the years of intuition that help them resolve these tiny errors. Standard LR parsers typically resolve syntax errors and their precise location poorly. We propose a methodology that helps locate where syntax errors occur, but also suggests possible changes to the token stream that can fix the error identified. This methodology finds syntax errors by checking if two language models “agree” on each token. If the models disagree, it indicates a possible syntax error; the methodology tries to suggest a fix by finding an alternative token sequence obtained from the models. We trained two LSTM (Long short-term memory) language models on a large corpus of JavaScript code collected from GitHub. The dual LSTM neural network model predicts the correct location of the syntax error 54.74% in its top 4 suggestions and produces an exact fix up to 35.50% of the time. The results show that this tool and methodology can locate and suggest corrections for syntax errors. Our methodology is of practical use to all programmers, but will be especially useful to novices frustrated with incomprehensible syntax errors.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::6e741f5a98b4dbb07ec85cf6db333a08 Zobrazit plný text záznamu