Výsledky vyhledávání - "Gatmiry, Khashayar"

Report

Can Looped Transformers Learn to Implement Multi-step Gradient Descent for In-context Learning?

Autor: Gatmiry, Khashayar, Saunshi, Nikunj, Reddi, Sashank J., Jegelka, Stefanie, Kumar, Sanjiv

The remarkable capability of Transformers to do reasoning and few-shot learning, without any fine-tuning, is widely conjectured to stem from their ability to implicitly simulate a multi-step algorithms -- such as gradient descent -- with their weight

Externí odkaz: http://arxiv.org/abs/2410.08292

Zobrazit plný text záznamu

Report

What does guidance do? A fine-grained analysis in a simple setting

Autor: Chidambaram, Muthu, Gatmiry, Khashayar, Chen, Sitan, Lee, Holden, Lu, Jianfeng

The use of guidance in diffusion models was originally motivated by the premise that the guidance-modified score is that of the data distribution tilted by a conditional likelihood raised to some power. In this work we clarify this misconception by r

Externí odkaz: http://arxiv.org/abs/2409.13074

Zobrazit plný text záznamu

Report

Adversarial Online Learning with Temporal Feedback Graphs

Autor: Gatmiry, Khashayar, Schneider, Jon

We study a variant of prediction with expert advice where the learner's action at round $t$ is only allowed to depend on losses on a specific subset of the rounds (where the structure of which rounds' losses are visible at time $t$ is provided by a d

Externí odkaz: http://arxiv.org/abs/2407.00571

Zobrazit plný text záznamu

Report

Learning Mixtures of Gaussians Using Diffusion Models

Autor: Gatmiry, Khashayar, Kelner, Jonathan, Lee, Holden

We give a new algorithm for learning mixtures of $k$ Gaussians (with identity covariance in $\mathbb{R}^n$) to TV error $\varepsilon$, with quasi-polynomial ($O(n^{\text{poly log}\left(\frac{n+k}{\varepsilon}\right)})$) time and sample complexity, un

Externí odkaz: http://arxiv.org/abs/2404.18869

Zobrazit plný text záznamu

Report

EM for Mixture of Linear Regression with Clustered Data

Autor: Reisizadeh, Amirhossein, Gatmiry, Khashayar, Ozdaglar, Asuman

Modern data-driven and distributed learning frameworks deal with diverse massive data generated by clients spread across heterogeneous environments. Indeed, data heterogeneity is a major bottleneck in scaling up many distributed learning paradigms. I

Externí odkaz: http://arxiv.org/abs/2308.11518

Zobrazit plný text záznamu

Report

A Unified Approach to Controlling Implicit Regularization via Mirror Descent

Autor: Sun, Haoyuan, Gatmiry, Khashayar, Ahn, Kwangjun, Azizan, Navid

Inspired by the remarkable success of large neural networks, there has been significant interest in understanding the generalization performance of over-parameterized models. Substantial efforts have been invested in characterizing how optimization a

Externí odkaz: http://arxiv.org/abs/2306.13853

Zobrazit plný text záznamu

Report

The Inductive Bias of Flatness Regularization for Deep Matrix Factorization

Autor: Gatmiry, Khashayar, Li, Zhiyuan, Chuang, Ching-Yao, Reddi, Sashank, Ma, Tengyu, Jegelka, Stefanie

Recent works on over-parameterized neural networks have shown that the stochasticity in optimizers has the implicit regularization effect of minimizing the sharpness of the loss function (in particular, the trace of its Hessian) over the family zero-

Externí odkaz: http://arxiv.org/abs/2306.13239

Zobrazit plný text záznamu

Report

Projection-Free Online Convex Optimization via Efficient Newton Iterations

Autor: Gatmiry, Khashayar, Mhammedi, Zakaria

This paper presents new projection-free algorithms for Online Convex Optimization (OCO) over a convex domain $\mathcal{K} \subset \mathbb{R}^d$. Classical OCO algorithms (such as Online Gradient Descent) typically need to perform Euclidean projection

Externí odkaz: http://arxiv.org/abs/2306.11121

Zobrazit plný text záznamu

Dissertation/ Thesis

Testing, Learning, and Optimization in High Dimensions

Autor: Gatmiry, Khashayar

In this thesis we study two separate problems: (1) What is the sample complexity of testing the class of Determinantal Point Processes? and (2) Introducing a new analysis for optimization and generalization of deep neural networks beyond their linear

Externí odkaz: https://hdl.handle.net/1721.1/144927

Zobrazit plný text záznamu

Report

When does Metropolized Hamiltonian Monte Carlo provably outperform Metropolis-adjusted Langevin algorithm?

Autor: Chen, Yuansi, Gatmiry, Khashayar

We analyze the mixing time of Metropolized Hamiltonian Monte Carlo (HMC) with the leapfrog integrator to sample from a distribution on $\mathbb{R}^d$ whose log-density is smooth, has Lipschitz Hessian in Frobenius norm and satisfies isoperimetry. We

Externí odkaz: http://arxiv.org/abs/2304.04724

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání