Zobrazeno 1 - 1
of 1
pro vyhledávání: '"Zmushko, Philip"'
With the increase in the number of parameters in large language models, the process of pre-training and fine-tuning increasingly demands larger volumes of GPU memory. A significant portion of this memory is typically consumed by the optimizer state.
Externí odkaz:
http://arxiv.org/abs/2411.07837