Výsledky vyhledávání - "Dandi, Yatin"

Report

A Random Matrix Theory Perspective on the Spectrum of Learned Features and Asymptotic Generalization Capabilities

Autor: Dandi, Yatin, Pesce, Luca, Cui, Hugo, Krzakala, Florent, Lu, Yue M., Loureiro, Bruno

A key property of neural networks is their capacity of adapting to data during training. Yet, our current mathematical understanding of feature learning and its relationship to generalization remain limited. In this work, we provide a random matrix a

Externí odkaz: http://arxiv.org/abs/2410.18938

Zobrazit plný text záznamu

Report

Online Learning and Information Exponents: On The Importance of Batch size, and Time/Complexity Tradeoffs

Autor: Arnaboldi, Luca, Dandi, Yatin, Krzakala, Florent, Loureiro, Bruno, Pesce, Luca, Stephan, Ludovic

Publikováno v: Proceedings of the 41st International Conference on Machine Learning, PMLR 235:1730-1762, 2024

We study the impact of the batch size $n_b$ on the iteration time $T$ of training two-layer neural networks with one-pass stochastic gradient descent (SGD) on multi-index target functions of isotropic covariates. We characterize the optimal batch siz

Externí odkaz: http://arxiv.org/abs/2406.02157

Zobrazit plný text záznamu

Report

Fundamental computational limits of weak learnability in high-dimensional multi-index models

Autor: Troiani, Emanuele, Dandi, Yatin, Defilippis, Leonardo, Zdeborová, Lenka, Loureiro, Bruno, Krzakala, Florent

Multi-index models - functions which only depend on the covariates through a non-linear transformation of their projection on a subspace - are a useful benchmark for investigating feature learning with neural nets. This paper examines the theoretical

Externí odkaz: http://arxiv.org/abs/2405.15480

Zobrazit plný text záznamu

Report

Repetita Iuvant: Data Repetition Allows SGD to Learn High-Dimensional Multi-Index Functions

Autor: Arnaboldi, Luca, Dandi, Yatin, Krzakala, Florent, Pesce, Luca, Stephan, Ludovic

Neural networks can identify low-dimensional relevant structures within high-dimensional noisy data, yet our mathematical understanding of how they do so remains scarce. Here, we investigate the training dynamics of two-layer shallow neural networks

Externí odkaz: http://arxiv.org/abs/2405.15459

Zobrazit plný text záznamu

Report

Asymptotics of feature learning in two-layer networks after one gradient-step

Autor: Cui, Hugo, Pesce, Luca, Dandi, Yatin, Krzakala, Florent, Lu, Yue M., Zdeborová, Lenka, Loureiro, Bruno

Publikováno v: Proceedings of the 41st International Conference on Machine Learning, PMLR 235:9662-9695, 2024

In this manuscript, we investigate the problem of how two-layer neural networks learn features from data, and improve over the kernel regime, after being trained with a single gradient descent step. Leveraging the insight from (Ba et al., 2022), we m

Externí odkaz: http://arxiv.org/abs/2402.04980

Zobrazit plný text záznamu

Report

The Benefits of Reusing Batches for Gradient Descent in Two-Layer Networks: Breaking the Curse of Information and Leap Exponents

Autor: Dandi, Yatin, Troiani, Emanuele, Arnaboldi, Luca, Pesce, Luca, Zdeborová, Lenka, Krzakala, Florent

Publikováno v: Proceedings of the 41st International Conference on Machine Learning, PMLR 235:9991-10016, 2024

We investigate the training dynamics of two-layer neural networks when learning multi-index target functions. We focus on multi-pass gradient descent (GD) that reuses the batches multiple times and show that it significantly changes the conclusion ab

Externí odkaz: http://arxiv.org/abs/2402.03220

Zobrazit plný text záznamu

Report

A Gentle Introduction to Gradient-Based Optimization and Variational Inequalities for Machine Learning

Autor: Wadia, Neha S., Dandi, Yatin, Jordan, Michael I.

The rapid progress in machine learning in recent years has been based on a highly productive connection to gradient-based optimization. Further progress hinges in part on a shift in focus from pattern recognition to decision-making and multi-agent pr

Externí odkaz: http://arxiv.org/abs/2309.04877

Zobrazit plný text záznamu

Report

Sampling with flows, diffusion and autoregressive neural networks: A spin-glass perspective

Autor: Ghio, Davide, Dandi, Yatin, Krzakala, Florent, Zdeborová, Lenka

Publikováno v: Proceedings of the National Academy of Sciences 121.27 (2024): e2311810121

Recent years witnessed the development of powerful generative models based on flows, diffusion or autoregressive neural networks, achieving remarkable success in generating data from examples with applications in a broad range of areas. A theoretical

Externí odkaz: http://arxiv.org/abs/2308.14085

Zobrazit plný text záznamu

Report

How Two-Layer Neural Networks Learn, One (Giant) Step at a Time

Autor: Dandi, Yatin, Krzakala, Florent, Loureiro, Bruno, Pesce, Luca, Stephan, Ludovic

We investigate theoretically how the features of a two-layer neural network adapt to the structure of the target function through a few large batch gradient descent steps, leading to improvement in the approximation capacity with respect to the initi

Externí odkaz: http://arxiv.org/abs/2305.18270

Zobrazit plný text záznamu

Report

Maximally-stable Local Optima in Random Graphs and Spin Glasses: Phase Transitions and Universality

Autor: Dandi, Yatin, Gamarnik, David, Zdeborová, Lenka

We provide a unified analysis of stable local optima of Ising spins with Hamiltonians having pair-wise interactions and partitions in random weighted graphs where a large number of vertices possess sufficient single spin-flip stability. For graphs, w

Externí odkaz: http://arxiv.org/abs/2305.03591

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání