Výsledky vyhledávání

Report

Graph Transformers Dream of Electric Flow

Autor: Cheng, Xiang, Carin, Lawrence, Sra, Suvrit

We show theoretically and empirically that the linear Transformer, when applied to graph data, can implement algorithms that solve canonical problems such as electric flow and eigenvector decomposition. The input to the Transformer is simply the grap

Externí odkaz: http://arxiv.org/abs/2410.16699

Zobrazit plný text záznamu

Report

Memory-augmented Transformers can implement Linear First-Order Optimization Methods

Autor: Dutta, Sanchayan, Sra, Suvrit

We show that memory-augmented Transformers (Memformers) can implement linear first-order optimization methods such as conjugate gradient descent, momentum methods, and more generally, methods that linearly combine past gradients. Building on prior wo

Externí odkaz: http://arxiv.org/abs/2410.07263

Zobrazit plný text záznamu

Report

First-Order Methods for Linearly Constrained Bilevel Optimization

Autor: Kornowski, Guy, Padmanabhan, Swati, Wang, Kai, Zhang, Zhe, Sra, Suvrit

Algorithms for bilevel optimization often encounter Hessian computations, which are prohibitive in high dimensions. While recent works offer first-order methods for unconstrained bilevel problems, the constrained setting remains relatively underexplo

Externí odkaz: http://arxiv.org/abs/2406.12771

Zobrazit plný text záznamu

Report

Riemannian Bilevel Optimization

Autor: Dutta, Sanchayan, Cheng, Xiang, Sra, Suvrit

We develop new algorithms for Riemannian bilevel optimization. We focus in particular on batch and stochastic gradient-based methods, with the explicit goal of avoiding second-order information such as Riemannian hyper-gradients. We propose and analy

Externí odkaz: http://arxiv.org/abs/2405.15816

Zobrazit plný text záznamu

Report

Efficient Sampling on Riemannian Manifolds via Langevin MCMC

Autor: Cheng, Xiang, Zhang, Jingzhao, Sra, Suvrit

We study the task of efficiently sampling from a Gibbs distribution $d \pi^* = e^{-h} d {vol}_g$ over a Riemannian manifold $M$ via (geometric) Langevin MCMC; this algorithm involves computing exponential maps in random Gaussian directions and is eff

Externí odkaz: http://arxiv.org/abs/2402.10357

Zobrazit plný text záznamu

Report

Transformers Implement Functional Gradient Descent to Learn Non-Linear Functions In Context

Autor: Cheng, Xiang, Chen, Yuxin, Sra, Suvrit

Many neural network architectures are known to be Turing Complete, and can thus, in principle implement arbitrary algorithms. However, Transformers are unique in that they can implement gradient-based learning algorithms under simple parameter config

Externí odkaz: http://arxiv.org/abs/2312.06528

Zobrazit plný text záznamu

Report

Linear attention is (maybe) all you need (to understand transformer optimization)

Autor: Ahn, Kwangjun, Cheng, Xiang, Song, Minhak, Yun, Chulhee, Jadbabaie, Ali, Sra, Suvrit

Transformer training is notoriously difficult, requiring a careful design of optimizers and use of various heuristics. We make progress towards understanding the subtleties of training Transformers by carefully studying a simple yet canonical lineari

Externí odkaz: http://arxiv.org/abs/2310.01082

Zobrazit plný text záznamu

Report

Invex Programs: First Order Algorithms and Their Convergence

Autor: Barik, Adarsh, Sra, Suvrit, Honorio, Jean

Invex programs are a special kind of non-convex problems which attain global minima at every stationary point. While classical first-order gradient descent methods can solve them, they converge very slowly. In this paper, we propose new first-order a

Externí odkaz: http://arxiv.org/abs/2307.04456

Zobrazit plný text záznamu

Report

Transformers learn to implement preconditioned gradient descent for in-context learning

Autor: Ahn, Kwangjun, Cheng, Xiang, Daneshmand, Hadi, Sra, Suvrit

Publikováno v: 37th Conference on Neural Information Processing Systems (NeurIPS 2023)

Several recent works demonstrate that transformers can implement algorithms like gradient descent. By a careful construction of weights, these works show that multiple layers of transformers are expressive enough to simulate iterations of gradient de

Externí odkaz: http://arxiv.org/abs/2306.00297

Zobrazit plný text záznamu

Report

How to escape sharp minima with random perturbations

Autor: Ahn, Kwangjun, Jadbabaie, Ali, Sra, Suvrit

Modern machine learning applications have witnessed the remarkable success of optimization algorithms that are designed to find flat minima. Motivated by this design choice, we undertake a formal study that (i) formulates the notion of flat minima, a

Externí odkaz: http://arxiv.org/abs/2305.15659

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání