ANGHABENCH: A Suite with One Million Compilable C Benchmarks for Code-Size Reduction

Autor: Fernando Magno Quintão Pereira, José Wesley de Souza Magalhães, Anderson Faustino da Silva, Jerônimo Nunes Rocha, Bruno Conde Kind, Breno Campos Ferreira Guimaraes
Rok vydání: 2021
Předmět:
Zdroj: CGO
DOI: 10.1109/cgo51591.2021.9370322
Popis: A predictive compiler uses properties of a program to decide how to optimize it. The compiler is trained on a collection of programs to derive a model which determines its actions in face of unknown codes. One of the challenges of predictive compilation is how to find good training sets. Regardless of the programming language, the availability of human-made benchmarks is limited. Moreover, current synthesizers produce code that is very different from actual programs, and mining compilable code from open repositories is difficult, due to program dependencies. In this paper, we use a combination of web crawling and type inference to overcome these problems for the C programming language. We use a type reconstructor based on Hindley-Milner's algorithm to produce ANGHABENCH, a virtually unlimited collection of real-world compilable C programs. Although ANGHABENCH programs are not executable, they can be transformed into object files by any C compliant compiler. Therefore, they can be used to train compilers for code size reduction. We have used thousands of ANGHABENCH programs to train YACOS, a predictive compiler based on LLVM. The version of YACOS autotuned with ANGHABENCH generates binaries for the LLVM test suite over 10% smaller than clang -Oz. It compresses code impervious even to the state-of-the-art Function Sequence Alignment technique published in 2019, as it does not require large binaries to work well.
Databáze: OpenAIRE