Mining input grammars from dynamic control flow
Autor: | Rahul Gopinath, Björn Mathis, Andreas Zeller |
---|---|
Rok vydání: | 2020 |
Předmět: |
Parsing
Computer science Programming language 020207 software engineering 02 engineering and technology Recursive descent parser Fuzz testing Context-free grammar computer.software_genre Control flow Rule-based machine translation Code refactoring Parser combinator 020204 information systems 0202 electrical engineering electronic engineering information engineering computer |
Zdroj: | ESEC/SIGSOFT FSE |
DOI: | 10.1145/3368089.3409679 |
Popis: | One of the key properties of a program is its input specification. Having a formal input specification can be critical in fields such as vulnerability analysis, reverse engineering, software testing, clone detection, or refactoring. Unfortunately, accurate input specifications for typical programs are often unavailable or out of date. In this paper, we present a general algorithm that takes a program and a small set of sample inputs and automatically infers a readable context-free grammar capturing the input language of the program. We infer the syntactic input structure only by observing access of input characters at different locations of the input parser. This works on all stack based recursive descent input parsers, including parser combinators, and works entirely without program specific heuristics. Our Mimid prototype produced accurate and readable grammars for a variety of evaluation subjects, including complex languages such as JSON, TinyC, and JavaScript. |
Databáze: | OpenAIRE |
Externí odkaz: |