Merlin HugeCTR: GPU-accelerated Recommender System Training and Inference

Autor:	Wang, Joey, Wei, Yingcan, Lee, Minseok, Langer, Matthias, Yu, Fan, Liu, Jie, Liu, Alex, Abel, Daniel, Guo, Gems, Dong, Jianbing, Shi, Jerry, Li, Kunlun
Rok vydání:	2022
Předmět:	Computer Science - Distributed Parallel and Cluster Computing Computer Science - Artificial Intelligence Computer Science - Information Retrieval Computer Science - Machine Learning
Zdroj:	Proceedings of the 16th ACM Conference on Recommender Systems, 2022
Druh dokumentu:	Working Paper
DOI:	10.1145/3523227.3547405
Popis:	In this talk, we introduce Merlin HugeCTR. Merlin HugeCTR is an open source, GPU-accelerated integration framework for click-through rate estimation. It optimizes both training and inference, whilst enabling model training at scale with model-parallel embeddings and data-parallel neural networks. In particular, Merlin HugeCTR combines a high-performance GPU embedding cache with an hierarchical storage architecture, to realize low-latency retrieval of embeddings for online model inference tasks. In the MLPerf v1.0 DLRM model training benchmark, Merlin HugeCTR achieves a speedup of up to 24.6x on a single DGX A100 (8x A100) over PyTorch on 4x4-socket CPU nodes (4x4x28 cores). Merlin HugeCTR can also take advantage of multi-node environments to accelerate training even further. Since late 2021, Merlin HugeCTR additionally features a hierarchical parameter server (HPS) and supports deployment via the NVIDIA Triton server framework, to leverage the computational capabilities of GPUs for high-speed recommendation model inference. Using this HPS, Merlin HugeCTR users can achieve a 5~62x speedup (batch size dependent) for popular recommendation models over CPU baseline implementations, and dramatically reduce their end-to-end inference latency. Comment: 4 pages
Databáze:	arXiv
Externí odkaz:	http://arxiv.org/abs/2210.08803 Zobrazit plný text záznamu View this record from Arxiv