ANGHABENCH: A Suite with One Million Compilable C Benchmarks for Code-Size Reduction
Autor: | Fernando Magno Quintão Pereira, José Wesley de Souza Magalhães, Anderson Faustino da Silva, Jerônimo Nunes Rocha, Bruno Conde Kind, Breno Campos Ferreira Guimaraes |
---|---|
Rok vydání: | 2021 |
Předmět: |
Programming language
Computer science Type inference 020207 software engineering 02 engineering and technology computer.file_format Object (computer science) computer.software_genre 020204 information systems 0202 electrical engineering electronic engineering information engineering Test suite Benchmark (computing) Code (cryptography) Code generation Executable Compiler computer |
Zdroj: | CGO |
DOI: | 10.1109/cgo51591.2021.9370322 |
Popis: | A predictive compiler uses properties of a program to decide how to optimize it. The compiler is trained on a collection of programs to derive a model which determines its actions in face of unknown codes. One of the challenges of predictive compilation is how to find good training sets. Regardless of the programming language, the availability of human-made benchmarks is limited. Moreover, current synthesizers produce code that is very different from actual programs, and mining compilable code from open repositories is difficult, due to program dependencies. In this paper, we use a combination of web crawling and type inference to overcome these problems for the C programming language. We use a type reconstructor based on Hindley-Milner's algorithm to produce ANGHABENCH, a virtually unlimited collection of real-world compilable C programs. Although ANGHABENCH programs are not executable, they can be transformed into object files by any C compliant compiler. Therefore, they can be used to train compilers for code size reduction. We have used thousands of ANGHABENCH programs to train YACOS, a predictive compiler based on LLVM. The version of YACOS autotuned with ANGHABENCH generates binaries for the LLVM test suite over 10% smaller than clang -Oz. It compresses code impervious even to the state-of-the-art Function Sequence Alignment technique published in 2019, as it does not require large binaries to work well. |
Databáze: | OpenAIRE |
Externí odkaz: |