Compute Aorta

Autor: Ewan Crawford, Alastair Murray
Rok vydání: 2020
Předmět:
Zdroj: IWOCL
Popis: The modern heterogeneous programming landscape has a wide variety of programming models targeting a range of hardware that is equally diverse. ComputeAorta from Codeplay Software Ltd is designed to be able to provide implementations of heterogeneous APIs, such as OpenCL or Vulkan Compute, on hardware ranging from DSPs to large machine learning accelerators. This talk is about how the design of ComputeAorta has evolved over the years to enable engineers to quickly implement industry-standard APIs for such devices, exposing their full performance capabilities with minimal effort.ComputeAorta exists within Codeplay's ComputeSuite stack along with the ComputeCpp implementation of SYCL and SYCL libraries such as SYCL-BLAS and SYCL-DNN. Thus, ComputeAorta is designed to provide a standards-compliant interface to custom, heterogeneous hardware that is used by applications further up the ComputeSuite stack. Key design goals of ComputeAorta are manifold. First, to provide mechanisms to support SPIR-V and similar technologies that enable high-level programming models. Secondly, to be able to map the parallelism inherent in data-parallel programming models to hardware via compiler optimizations. Additionally, to expose irregular hardware features via language or API extensions so that programmers can reliably achieve top performance. Finally, to minimize the amount of effort required to create a correct implementation of a heterogeneous API on a new hardware device.These design goals are achieved using an internal specification for a very low level programming model called "Mux". Standardized programming models, such as OpenCL and Vulkan Compute, are implemented in terms of the Mux specification, and Mux is implemented for each unique hardware device. This separation of concerns allows a dedicated customer team to focus on each new device, implementing the specification in whatever way is necessary to achieve top performance on that hardware. To aid customer teams, the ComputeAorta toolkit contains a reference CPU implementation of Mux, example OpenCL extensions, a set of compiler passes, a math library, and carefully maintained build and test infrastructure.In this talk we will cover the challenges of supporting multiple heterogeneous APIs in a single codebase and the implications of implementing public APIs in terms of another abstraction layer. The talk will reference related or alternative approaches, such as Intel's Level Zero[1], clvk [3], and POCL[2]. We will also cover how separating different aspects of implementation via a specification allows the project to scale to many varied customer hardware devices and continuously adapt in the ever evolving fields of heterogeneous compute architectures and programming models, all the while retaining centralized test-suites enforce correctness across all API implementations. We will use our experience implementing multiple heterogeneous programming models to provide comments on future-proofing for upcoming APIs, such as OpenCL Next, and our experience implementing on a variety of hardware to explain the rationale of our design decisions. This will all help the audience to understand what the key concerns for implementing heterogeneous programming models are, and what to consider should they too end up embarking on such an project.
Databáze: OpenAIRE