Zobrazeno 1 - 10
of 25 863
pro vyhledávání: '"CPU GPU"'
Due to the high resource demands of Large Language Models (LLMs), achieving widespread deployment on consumer-grade devices presents significant challenges. Typically, personal or consumer-grade devices, including servers configured prior to the era
Externí odkaz:
http://arxiv.org/abs/2412.18934
Autor:
Hao, Jiao1 (AUTHOR), Zhang, Zongbao1 (AUTHOR), He, Zonglin2 (AUTHOR), Liu, Zhengyuan2 (AUTHOR), Tan, Zhengdong2 (AUTHOR), Song, Yankan2 (AUTHOR) sykfmlrc@163.com
Publikováno v:
Energies (19961073). Dec2024, Vol. 17 Issue 24, p6269. 14p.
Autor:
Ichimura, Tsuyoshi, Fujita, Kohei, Hori, Muneo, Maddegedara, Lalith, Wells, Jack, Gray, Alan, Karlin, Ian, Linford, John
We propose a CPU-GPU heterogeneous computing method for solving time-evolution partial differential equation problems many times with guaranteed accuracy, in short time-to-solution and low energy-to-solution. On a single-GH200 node, the proposed meth
Externí odkaz:
http://arxiv.org/abs/2409.20380
Autor:
Tian, Bing, Liu, Haikun, Tang, Yuhang, Xiao, Shihai, Duan, Zhuohui, Liao, Xiaofei, Zhang, Xuecang, Zhu, Junhua, Zhang, Yu
Approximate nearest neighbor search (ANNS) has emerged as a crucial component of database and AI infrastructure. Ever-increasing vector datasets pose significant challenges in terms of performance, cost, and accuracy for ANNS services. None of modern
Externí odkaz:
http://arxiv.org/abs/2409.16576
Autor:
Yi, Xinyao
Parallel computing is a standard approach to achieving high-performance computing (HPC). Three commonly used methods to implement parallel computing include: 1) applying multithreading technology on single-core or multi-core CPUs; 2) incorporating po
Externí odkaz:
http://arxiv.org/abs/2409.10661
Autor:
Wang, Qifan, Oswald, David
In recent years, the widespread informatization and rapid data explosion have increased the demand for high-performance heterogeneous systems that integrate multiple computing cores such as CPUs, Graphics Processing Units (GPUs), Application Specific
Externí odkaz:
http://arxiv.org/abs/2408.11601
Memory management across discrete CPU and GPU physical memory is traditionally achieved through explicit GPU allocations and data copy or unified virtual memory. The Grace Hopper Superchip, for the first time, supports an integrated CPU-GPU system pa
Externí odkaz:
http://arxiv.org/abs/2407.07850
Transformers and LLMs have seen rapid adoption in all domains. Their sizes have exploded to hundreds of billions of parameters and keep increasing. Under these circumstances, the training of transformers is slow and often takes in the order of weeks
Externí odkaz:
http://arxiv.org/abs/2406.10728
Publikováno v:
Architecture of Computing Systems. ARCS 2022. Lecture Notes in Computer Science, vol 13642. Springer, Cham
CPU-GPU heterogeneous architectures are now commonly used in a wide variety of computing systems from mobile devices to supercomputers. Maximizing the throughput for multi-programmed workloads on such systems is indispensable as one single program ty
Externí odkaz:
http://arxiv.org/abs/2405.03831
Finding the most sparse solution to the underdetermined system $\mathbf{y}=\mathbf{Ax}$, given a tolerance, is known to be NP-hard. A popular way to approximate a sparse solution is by using Greedy Pursuit algorithms, and Orthogonal Matching Pursuit
Externí odkaz:
http://arxiv.org/abs/2407.06434