Zobrazeno 1 - 1
of 1
pro vyhledávání: '"Klyukin, Yaroslav"'
Autor:
Chezhegov, Savelii, Klyukin, Yaroslav, Semenov, Andrei, Beznosikov, Aleksandr, Gasnikov, Alexander, Horváth, Samuel, Takáč, Martin, Gorbunov, Eduard
Methods with adaptive stepsizes, such as AdaGrad and Adam, are essential for training modern Deep Learning models, especially Large Language Models. Typically, the noise in the stochastic gradients is heavy-tailed for the later ones. Gradient clippin
Externí odkaz:
http://arxiv.org/abs/2406.04443