Zobrazeno 1 - 10
of 335
pro vyhledávání: '"C.1.3"'
Autor:
Koeplinger, David, Gandhi, Darshan, Nandkar, Pushkar, Sheeley, Nathan, Musaddiq, Matheen, Zhang, Leon, Goodbar, Reid, Shaffer, Matthew, Wang, Han, Wang, Angela, Wang, Mingran, Prabhakar, Raghu
Token generation speed is critical to power the next wave of AI inference applications. GPUs significantly underperform during token generation due to synchronization overheads at kernel boundaries, utilizing only 21% of their peak memory bandwidth.
Externí odkaz:
http://arxiv.org/abs/2410.23668
Autor:
Wang, Zihan, Yang, Daniel W., Liu, Zerui, Yan, Evan, Sun, Heming, Ge, Ning, Hu, Miao, Wu, Wei
This study presents the first implementation of multilayer neural networks on a memristor/CMOS integrated system on chip (SoC) to simultaneously detect multiple diseases. To overcome limitations in medical data, generative AI techniques are used to e
Externí odkaz:
http://arxiv.org/abs/2410.14882
Considering the high-performance and low-power requirements of edge AI, this study designs a specialized instruction set processor for edge AI based on the RISC-V instruction set architecture, addressing practical issues in digital signal processing
Externí odkaz:
http://arxiv.org/abs/2409.00661
Autor:
Aldinucci, Marco, Danelutto, Marco
Publikováno v:
Proc. of PDCS: Intl. Conference on Parallel and Distributed Computing and Systems, pages 955-962, Cambridge, Massachusetts, USA, Nov. 1999. IASTED, ACTA press
We discuss the properties of the composition of stream parallel skeletons such as pipelines and farms. By looking at the ideal performance figures assumed to hold for these skeletons, we show that any stream parallel skeleton composition can always b
Externí odkaz:
http://arxiv.org/abs/2408.12394
Autor:
Moser, Bernhard A., Lunglmayr, Michael
The linearly inseparable XOR problem and the related problem of representing binary logical gates is revisited from the point of view of temporal encoding and its solvability by spiking neural networks with minimal configurations of leaky integrate-a
Externí odkaz:
http://arxiv.org/abs/2408.05845
Autor:
Rao, Ravipudi Venkata, shah, Ravikumar
Two simple yet powerful optimization algorithms, named the Best-Mean-Random (BMR) and Best-Worst-Random (BWR) algorithms, are developed and presented in this paper to handle both constrained and unconstrained optimization problems. These algorithms a
Externí odkaz:
http://arxiv.org/abs/2407.11149
Matrix multiplication is a cornerstone operation in a wide array of scientific fields, including machine learning and computer graphics. The standard algorithm for matrix multiplication has a complexity of $\mathcal{O}(n^3)$ for $n\times n$ matrices.
Externí odkaz:
http://arxiv.org/abs/2406.02088
Autor:
Prabhakar, Raghu, Sivaramakrishnan, Ram, Gandhi, Darshan, Du, Yun, Wang, Mingran, Song, Xiangyu, Zhang, Kejie, Gao, Tianren, Wang, Angela, Li, Karen, Sheng, Yongning, Brot, Joshua, Sokolov, Denis, Vivek, Apurv, Leung, Calvin, Sabnis, Arjun, Bai, Jiayu, Zhao, Tuowen, Gottscho, Mark, Jackson, David, Luttrell, Mark, Shah, Manish K., Chen, Edison, Liang, Kaizhao, Jain, Swayambhoo, Thakker, Urmish, Huang, Dawei, Jairath, Sumti, Brown, Kevin J., Olukotun, Kunle
Monolithic large language models (LLMs) like GPT-4 have paved the way for modern generative AI applications. Training, serving, and maintaining monolithic LLMs at scale, however, remains prohibitively expensive and challenging. The disproportionate i
Externí odkaz:
http://arxiv.org/abs/2405.07518
Energy efficiency of electronic digital processors is primarily limited by the energy consumption of electronic communication and interconnects. The industry is almost unanimously pushing towards replacing both long-haul, as well as local chip interc
Externí odkaz:
http://arxiv.org/abs/2403.00045
We propose a novel computing runtime that exposes remote compute devices via the cross-vendor open heterogeneous computing standard OpenCL and can execute compute tasks on the MEC cluster side across multiple servers in a scalable manner. Intermitten
Externí odkaz:
http://arxiv.org/abs/2309.00407