Zobrazeno 1 - 10
of 74
pro vyhledávání: '"Ahn, Kwangjun"'
Autor:
Pfrommer, Daniel, Padmanabhan, Swati, Ahn, Kwangjun, Umenberger, Jack, Marcucci, Tobia, Mhammedi, Zakaria, Jadbabaie, Ali
Recent work in imitation learning has shown that having an expert controller that is both suitably smooth and stable enables stronger guarantees on the performance of the learned controller. However, constructing such smoothed expert controllers for
Externí odkaz:
http://arxiv.org/abs/2410.00859
Autor:
Ahn, Kwangjun, Cutkosky, Ashok
In this work, we offer a theoretical analysis of two modern optimization techniques for training large and complex models: (i) adaptive optimization algorithms, such as Adam, and (ii) the model exponential moving average (EMA). Specifically, we demon
Externí odkaz:
http://arxiv.org/abs/2405.18199
Understanding the training dynamics of deep neural networks is challenging due to their high-dimensional nature and intricate loss landscapes. Recent studies have revealed that, along the training trajectory, the gradient approximately aligns with a
Externí odkaz:
http://arxiv.org/abs/2405.16002
Despite the success of the Adam optimizer in practice, the theoretical understanding of its algorithmic components still remains limited. In particular, most existing analyses of Adam show the convergence rate that can be simply achieved by non-adati
Externí odkaz:
http://arxiv.org/abs/2402.01567
Transformer training is notoriously difficult, requiring a careful design of optimizers and use of various heuristics. We make progress towards understanding the subtleties of training Transformers by carefully studying a simple yet canonical lineari
Externí odkaz:
http://arxiv.org/abs/2310.01082
Inspired by the remarkable success of large neural networks, there has been significant interest in understanding the generalization performance of over-parameterized models. Substantial efforts have been invested in characterizing how optimization a
Externí odkaz:
http://arxiv.org/abs/2306.13853
Autor:
Pfrommer, Daniel, Padmanabhan, Swati, Ahn, Kwangjun, Umenberger, Jack, Marcucci, Tobia, Mhammedi, Zakaria, Jadbabaie, Ali
Recent work in imitation learning has shown that having an expert controller that is both suitably smooth and stable enables stronger guarantees on the performance of the learned controller. However, constructing such smoothed expert controllers for
Externí odkaz:
http://arxiv.org/abs/2306.01914
Publikováno v:
37th Conference on Neural Information Processing Systems (NeurIPS 2023)
Several recent works demonstrate that transformers can implement algorithms like gradient descent. By a careful construction of weights, these works show that multiple layers of transformers are expressive enough to simulate iterations of gradient de
Externí odkaz:
http://arxiv.org/abs/2306.00297
Modern machine learning applications have witnessed the remarkable success of optimization algorithms that are designed to find flat minima. Motivated by this design choice, we undertake a formal study that (i) formulates the notion of flat minima, a
Externí odkaz:
http://arxiv.org/abs/2305.15659
Sharpness-Aware Minimization (SAM) is a recently proposed gradient-based optimizer (Foret et al., ICLR 2021) that greatly improves the prediction performance of deep neural networks. Consequently, there has been a surge of interest in explaining its
Externí odkaz:
http://arxiv.org/abs/2305.15287