Popis: |
Modern, optimizing compilers rely on hundreds of heuristics to decide whether and how to apply optimizations during compilation. These heuristics are typically hand-crafted and designed to achieve good performance on a set of benchmark programs which are based on real-world applications. This manual process has multiple shortcomings: It requires experienced compiler engineers as well as continuous re-evaluation and tweaking of heuristics if compiler internals change. In addition, maintaining multiple, domain-specific heuristics is often considered infeasible, which results in the deployment of one-size-fits-all heuristics. Data-driven approaches, based on machine learning, have been shown to outperform hand-crafted compiler heuristics in that they can suggest optimization decisions which yield better performing programs after compilation. Nevertheless, only a single recent machine learning approach has made it into a production-level compiler. Existing research primarily focuses on using machine learning in static compilers, where the compilation process is stable and reproducible. In contrast, dynamic compilers run in parallel to the executed program where they compile frequently executed code. On top of that, the use of profiling-based speculative optimizations adds an additional level of non-determinism to dynamic compilation. This lack of consistency aggravates the use of data-driven approaches in such systems. In this thesis, we make multiple contributions to the use of machine learning in dynamic compilers. We address the problems of data stability and consistency by proposing compilation forking, a technique for extracting performance data from method-local optimization decisions in arbitrary programs. Based on this data, we train several machine learning models to replace optimization heuristics for loop optimizations, such as peeling, unrolling or vectorization. Furthermore, we present self-optimizing compiler heuristics, based on machine learning models which are continuously refined with new data at the user site. This approach also enables the creation of domain-specific heuristics which we showed to outperform one-size-fits-all heuristics significantly. Apart from deploying machine learning in production systems, we also outline how models can assist compiler engineers during heuristics design and evaluation. We implemented our approaches in the GraalVM compiler, which is among the most highly optimizing Java compilers on the market. Our learned models are either on par with the well-tuned heuristics in the GraalVM or outperform them significantly with only minor regressions. In addition to that, our assistive approach enabled improvements in existing heuristics without the actual deployment of machine learning models in production systems. submitted by DI Raphael Moaner, BSc Dissertation Universität Linz 2023 |