Výsledky vyhledávání - "Misiakiewicz, Theodor"

Report

On the Complexity of Learning Sparse Functions with Statistical and Gradient Queries

Autor: Joshi, Nirmit, Misiakiewicz, Theodor, Srebro, Nathan

The goal of this paper is to investigate the complexity of gradient algorithms when learning sparse functions (juntas). We introduce a type of Statistical Queries ($\mathsf{SQ}$), which we call Differentiable Learning Queries ($\mathsf{DLQ}$), to mod

Externí odkaz: http://arxiv.org/abs/2407.05622

Zobrazit plný text záznamu

Report

Dimension-free deterministic equivalents for random feature regression

Autor: Defilippis, Leonardo, Loureiro, Bruno, Misiakiewicz, Theodor

In this work we investigate the generalization performance of random feature ridge regression (RFRR). Our main contribution is a general deterministic equivalent for the test error of RFRR. Specifically, under a certain concentration property, we sho

Externí odkaz: http://arxiv.org/abs/2405.15699

Zobrazit plný text záznamu

Report

A non-asymptotic theory of Kernel Ridge Regression: deterministic equivalents, test error, and GCV estimator

Autor: Misiakiewicz, Theodor, Saeed, Basil

We consider learning an unknown target function $f_*$ using kernel ridge regression (KRR) given i.i.d. data $(u_i,y_i)$, $i\leq n$, where $u_i \in U$ is a covariate vector and $y_i = f_* (u_i) +\varepsilon_i \in \mathbb{R}$. A recent string of work h

Externí odkaz: http://arxiv.org/abs/2403.08938

Zobrazit plný text záznamu

Report

Asymptotics of Random Feature Regression Beyond the Linear Scaling Regime

Autor: Hu, Hong, Lu, Yue M., Misiakiewicz, Theodor

Recent advances in machine learning have been achieved by using overparametrized models trained until near interpolation of the training data. It was shown, e.g., through the double descent phenomenon, that the number of parameters is a poor proxy fo

Externí odkaz: http://arxiv.org/abs/2403.08160

Zobrazit plný text záznamu

Report

Six Lectures on Linearized Neural Networks

Autor: Misiakiewicz, Theodor, Montanari, Andrea

In these six lectures, we examine what can be learnt about the behavior of multi-layer neural networks from the analysis of linear models. We first recall the correspondence between neural networks and linear models via the so-called lazy regime. We

Externí odkaz: http://arxiv.org/abs/2308.13431

Zobrazit plný text záznamu

Report

SGD learning on neural networks: leap complexity and saddle-to-saddle dynamics

Autor: Abbe, Emmanuel, Boix-Adsera, Enric, Misiakiewicz, Theodor

We investigate the time complexity of SGD learning on fully-connected neural networks with isotropic data. We put forward a complexity measure -- the leap -- which measures how "hierarchical" target functions are. For $d$-dimensional uniform Boolean

Externí odkaz: http://arxiv.org/abs/2302.11055

Zobrazit plný text záznamu

Report

Precise Learning Curves and Higher-Order Scaling Limits for Dot Product Kernel Regression

Autor: Xiao, Lechao, Hu, Hong, Misiakiewicz, Theodor, Lu, Yue M., Pennington, Jeffrey

As modern machine learning models continue to advance the computational frontier, it has become increasingly important to develop precise estimates for expected performance improvements under different model and data scaling regimes. Currently, theor

Externí odkaz: http://arxiv.org/abs/2205.14846

Zobrazit plný text záznamu

Report

Spectrum of inner-product kernel matrices in the polynomial regime and multiple descent phenomenon in kernel ridge regression

Autor: Misiakiewicz, Theodor

We study the spectrum of inner-product kernel matrices, i.e., $n \times n$ matrices with entries $h (\langle \textbf{x}_i ,\textbf{x}_j \rangle/d)$ where the $( \textbf{x}_i)_{i \leq n}$ are i.i.d.~random covariates in $\mathbb{R}^d$. In the linear h

Externí odkaz: http://arxiv.org/abs/2204.10425

Zobrazit plný text záznamu

Report

The merged-staircase property: a necessary and nearly sufficient condition for SGD learning of sparse functions on two-layer neural networks

Autor: Abbe, Emmanuel, Boix-Adsera, Enric, Misiakiewicz, Theodor

It is currently known how to characterize functions that neural networks can learn with SGD for two extremal parameterizations: neural networks in the linear regime, and neural networks with no structural constraints. However, for the main parametriz

Externí odkaz: http://arxiv.org/abs/2202.08658

Zobrazit plný text záznamu

Report

Learning with convolution and pooling operations in kernel methods

Autor: Misiakiewicz, Theodor, Mei, Song

Recent empirical work has shown that hierarchical convolutional kernels inspired by convolutional neural networks (CNNs) significantly improve the performance of kernel methods in image classification tasks. A widely accepted explanation for their su

Externí odkaz: http://arxiv.org/abs/2111.08308

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání