Generative AI on the Edge: Architecture and Performance Evaluation

Autor:	Nezami, Zeinab, Hafeez, Maryam, Djemame, Karim, Zaidi, Syed Ali Raza
Rok vydání:	2024
Předmět:	Computer Science - Distributed Parallel and Cluster Computing Computer Science - Artificial Intelligence Computer Science - Networking and Internet Architecture Computer Science - Performance
Druh dokumentu:	Working Paper
Popis:	6G's AI native vision of embedding advance intelligence in the network while bringing it closer to the user requires a systematic evaluation of Generative AI (GenAI) models on edge devices. Rapidly emerging solutions based on Open RAN (ORAN) and Network-in-a-Box strongly advocate the use of low-cost, off-the-shelf components for simpler and efficient deployment, e.g., in provisioning rural connectivity. In this context, conceptual architecture, hardware testbeds and precise performance quantification of Large Language Models (LLMs) on off-the-shelf edge devices remains largely unexplored. This research investigates computationally demanding LLM inference on a single commodity Raspberry Pi serving as an edge testbed for ORAN. We investigate various LLMs, including small, medium and large models, on a Raspberry Pi 5 Cluster using a lightweight Kubernetes distribution (K3s) with modular prompting implementation. We study its feasibility and limitations by analyzing throughput, latency, accuracy and efficiency. Our findings indicate that CPU-only deployment of lightweight models, such as Yi, Phi, and Llama3, can effectively support edge applications, achieving a generation throughput of 5 to 12 tokens per second with less than 50\% CPU and RAM usage. We conclude that GenAI on the edge offers localized inference in remote or bandwidth-constrained environments in 6G networks without reliance on cloud infrastructure.
Databáze:	arXiv
Externí odkaz:	http://arxiv.org/abs/2411.17712 Zobrazit plný text záznamu View this record from Arxiv