Syntax errors just aren't natural: improving error reporting with language models
Autor: | José Nelson Amaral, Joshua Charles Campbell, Abram Hindle |
---|---|
Rok vydání: | 2014 |
Předmět: |
Source code
Syntax (programming languages) business.industry Computer science Programming language media_common.quotation_subject Homoiconicity computer.software_genre Abstract syntax Code generation Artificial intelligence Compiler Syntax error Abstract syntax tree business computer Natural language processing media_common |
Zdroj: | MSR |
DOI: | 10.1145/2597073.2597102 |
Popis: | A frustrating aspect of software development is that compiler error messages often fail to locate the actual cause of a syntax error. An errant semicolon or brace can result in many errors reported throughout the file. We seek to find the actual source of these syntax errors by relying on the consistency of software: valid source code is usually repetitive and unsurprising. We exploit this consistency by constructing a simple N-gram language model of lexed source code tokens. We implemented an automatic Java syntax-error locator using the corpus of the project itself and evaluated its performance on mutated source code from several projects. Our tool, trained on the past versions of a project, can effectively augment the syntax error locations produced by the native compiler. Thus we provide a methodology and tool that exploits the naturalness of software source code to detect syntax errors alongside the parser. |
Databáze: | OpenAIRE |
Externí odkaz: |