Popis: |
FPGA-SoCs that consist of general-purpose CPUs and FPGA-like programmable logic are promising platforms for accelerating countless applications from many application domains, such as high-performance computing (HPC) and deep learning. Last generation FPGA-SoCs, however, suffer from a severe memory bandwidth bottleneck, thereby limiting their deployability for many application domains. FPGA vendors have reacted to this issue when developing the next generation of FPGA-SoCs by increasing the achievable overall memory bandwidth and adding improved support for cache coherent memory accesses. We thoroughly evaluated the memory architecture of a representative next-generation FPGA-SoC to independently verify these claims. The achievable memory bandwidth for various configurations and access patterns is presented, while strengths and weaknesses of different memory interfaces are discussed. Our results show that the achievable overall memory bandwidth is at least 4x the bandwidth for previous FPGA-SoCs. Even more important, the cache coherency support has significantly improved, easing shared-memory-based communication between FPGA and CPU. Based on our evaluation, we provide researchers and developers with advice and insights on how to achieve maximum bandwidth for their specific application. |