Zobrazeno 1 - 4
of 4
pro vyhledávání: '"Zhong, Shuzhang"'
PrivQuant: Communication-Efficient Private Inference with Quantized Network/Protocol Co-Optimization
Private deep neural network (DNN) inference based on secure two-party computation (2PC) enables secure privacy protection for both the server and the client. However, existing secure 2PC frameworks suffer from a high inference latency due to enormous
Externí odkaz:
http://arxiv.org/abs/2410.09531
Mixture-of-Experts (MoE) models are designed to enhance the efficiency of large language models (LLMs) without proportionally increasing the computational demands. However, their deployment on edge devices still faces significant challenges due to hi
Externí odkaz:
http://arxiv.org/abs/2408.10284
Recent advancements in generative large language models (LLMs) have significantly boosted the performance in natural language processing tasks. However, their efficiency is hampered by the inherent limitations in autoregressive token generation. Whil
Externí odkaz:
http://arxiv.org/abs/2402.13485
Memory-aware network scheduling is becoming increasingly important for deep neural network (DNN) inference on resource-constrained devices. However, due to the complex cell-level and network-level topologies, memory-aware scheduling becomes very chal
Externí odkaz:
http://arxiv.org/abs/2308.13898