A Simple, Possibly Correct LR Parser for C11
Autor: | Jacques-Henri Jourdan, François Pottier |
---|---|
Přispěvatelé: | Langages de programmation, types, compilation et preuves (GALLIUM), Inria de Paris, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria), Max Planck Institute for Software Systems (MPI-SWS), ANR-11-INSE-0003,VERASCO,Vérification formelle d'analyseurs statiques et de compilateurs(2011) |
Rok vydání: | 2017 |
Předmět: |
Computer science
parsing Operator-precedence grammar C language 02 engineering and technology Recursive descent parser computer.software_genre Top-down parsing Canonical LR parser Parser combinator lexical feedback 0202 electrical engineering electronic engineering information engineering C99 C11 [INFO.INFO-PL]Computer Science [cs]/Programming Languages [cs.PL] Programming language business.industry 020207 software engineering Simple LR parser TheoryofComputation_MATHEMATICALLOGICANDFORMALLANGUAGES GLR parser ambiguity 020201 artificial intelligence & image processing Artificial intelligence LALR parser business computer Software Natural language processing C89 |
Zdroj: | ACM Transactions on Programming Languages and Systems (TOPLAS) ACM Transactions on Programming Languages and Systems (TOPLAS), ACM, 2017, 39 (4), pp.1-36. ⟨10.1145/3064848⟩ ACM Transactions on Programming Languages and Systems (TOPLAS), 2017, 39 (4), pp.1-36. ⟨10.1145/3064848⟩ |
ISSN: | 1558-4593 0164-0925 |
DOI: | 10.1145/3064848 |
Popis: | International audience; The syntax of the C programming language is described in the C11 standard by an ambiguous context-free grammar, accompanied with English prose that describes the concept of " scope " and indicates how certain ambiguous code fragments should be interpreted. Based on these elements, the problem of implementing a compliant C11 parser is not entirely trivial. We review the main sources of difficulty and describe a relatively simple solution to the problem. Our solution employs the well-known technique of combining an LALR(1) parser with a " lexical feedback " mechanism. It draws on folklore knowledge and adds several original aspects , including: a twist on lexical feedback that allows a smooth interaction with lookahead; a simplified and powerful treatment of scopes; and a few amendments in the grammar. Although not formally verified, our parser avoids several pitfalls that other implementations have fallen prey to. We believe that its simplicity, its mostly-declarative nature, and its high similarity with the C11 grammar are strong informal arguments in favor of its correctness. Our parser is accompanied with a small suite of " tricky " C11 programs. We hope that it may serve as a reference or a starting point in the implementation of compilers and analysis tools. |
Databáze: | OpenAIRE |
Externí odkaz: |