Zobrazeno 1 - 10
of 41
pro vyhledávání: '"Gatmiry, Khashayar"'
The remarkable capability of Transformers to do reasoning and few-shot learning, without any fine-tuning, is widely conjectured to stem from their ability to implicitly simulate a multi-step algorithms -- such as gradient descent -- with their weight
Externí odkaz:
http://arxiv.org/abs/2410.08292
The use of guidance in diffusion models was originally motivated by the premise that the guidance-modified score is that of the data distribution tilted by a conditional likelihood raised to some power. In this work we clarify this misconception by r
Externí odkaz:
http://arxiv.org/abs/2409.13074
Autor:
Gatmiry, Khashayar, Schneider, Jon
We study a variant of prediction with expert advice where the learner's action at round $t$ is only allowed to depend on losses on a specific subset of the rounds (where the structure of which rounds' losses are visible at time $t$ is provided by a d
Externí odkaz:
http://arxiv.org/abs/2407.00571
We give a new algorithm for learning mixtures of $k$ Gaussians (with identity covariance in $\mathbb{R}^n$) to TV error $\varepsilon$, with quasi-polynomial ($O(n^{\text{poly log}\left(\frac{n+k}{\varepsilon}\right)})$) time and sample complexity, un
Externí odkaz:
http://arxiv.org/abs/2404.18869
Modern data-driven and distributed learning frameworks deal with diverse massive data generated by clients spread across heterogeneous environments. Indeed, data heterogeneity is a major bottleneck in scaling up many distributed learning paradigms. I
Externí odkaz:
http://arxiv.org/abs/2308.11518
Inspired by the remarkable success of large neural networks, there has been significant interest in understanding the generalization performance of over-parameterized models. Substantial efforts have been invested in characterizing how optimization a
Externí odkaz:
http://arxiv.org/abs/2306.13853
Autor:
Gatmiry, Khashayar, Li, Zhiyuan, Chuang, Ching-Yao, Reddi, Sashank, Ma, Tengyu, Jegelka, Stefanie
Recent works on over-parameterized neural networks have shown that the stochasticity in optimizers has the implicit regularization effect of minimizing the sharpness of the loss function (in particular, the trace of its Hessian) over the family zero-
Externí odkaz:
http://arxiv.org/abs/2306.13239
Autor:
Gatmiry, Khashayar, Mhammedi, Zakaria
This paper presents new projection-free algorithms for Online Convex Optimization (OCO) over a convex domain $\mathcal{K} \subset \mathbb{R}^d$. Classical OCO algorithms (such as Online Gradient Descent) typically need to perform Euclidean projection
Externí odkaz:
http://arxiv.org/abs/2306.11121
Autor:
Gatmiry, Khashayar
In this thesis we study two separate problems: (1) What is the sample complexity of testing the class of Determinantal Point Processes? and (2) Introducing a new analysis for optimization and generalization of deep neural networks beyond their linear
Externí odkaz:
https://hdl.handle.net/1721.1/144927
Autor:
Chen, Yuansi, Gatmiry, Khashayar
We analyze the mixing time of Metropolized Hamiltonian Monte Carlo (HMC) with the leapfrog integrator to sample from a distribution on $\mathbb{R}^d$ whose log-density is smooth, has Lipschitz Hessian in Frobenius norm and satisfies isoperimetry. We
Externí odkaz:
http://arxiv.org/abs/2304.04724