SPIN
Autor: | Shai Bergman, Mark Silberstein, Tzachi Cohen, Tanya Brokhman |
---|---|
Rok vydání: | 2018 |
Předmět: |
010302 applied physics
Source lines of code General Computer Science Computer science NVM Express 0211 other engineering and technologies 02 engineering and technology computer.software_genre 01 natural sciences Rendering (computer graphics) Software portability POSIX 0103 physical sciences Operating system Page cache Central processing unit Direct memory access computer 021106 design practice & management |
Zdroj: | ACM Transactions on Computer Systems. 36:1-26 |
ISSN: | 1557-7333 0734-2071 |
DOI: | 10.1145/3309987 |
Popis: | Recent GPUs enable Peer-to-Peer Direct Memory Access ( p 2 p ) from fast peripheral devices like NVMe SSDs to exclude the CPU from the data path between them for efficiency. Unfortunately, using p 2 p to access files is challenging because of the subtleties of low-level non-standard interfaces, which bypass the OS file I/O layers and may hurt system performance. Developers must possess intimate knowledge of low-level interfaces to manually handle the subtleties of data consistency and misaligned accesses. We present SPIN , which integrates p 2 p into the standard OS file I/O stack, dynamically activating p 2 p where appropriate, transparently to the user. It combines p 2 p with page cache accesses, re-enables read-ahead for sequential reads, all while maintaining standard POSIX FS consistency, portability across GPUs and SSDs, and compatibility with virtual block devices such as software RAID. We evaluate SPIN on NVIDIA and AMD GPUs using standard file I/O benchmarks, application traces, and end-to-end experiments. SPIN achieves significant performance speedups across a wide range of workloads, exceeding p 2 p throughput by up to an order of magnitude. It also boosts the performance of an aerial imagery rendering application by 2.6× by dynamically adapting to its input-dependent file access pattern, enables 3.3× higher throughput for a GPU-accelerated log server, and enables 29% faster execution for the highly optimized GPU-accelerated image collage with only 30 changed lines of code. |
Databáze: | OpenAIRE |
Externí odkaz: |