Zobrazeno 1 - 1
of 1
pro vyhledávání: '"Choi, Kwanseok"'
Mixture-of-Experts (MoE) large language models (LLM) have memory requirements that often exceed the GPU memory capacity, requiring costly parameter movement from secondary memories to the GPU for expert computation. In this work, we present Mixture o
Externí odkaz:
http://arxiv.org/abs/2405.18832