Zobrazeno 1 - 10
of 212
pro vyhledávání: '"Pehlevan, Cengiz"'
We consider neural networks (NNs) where the final layer is down-scaled by a fixed hyperparameter $\gamma$. Recent work has identified $\gamma$ as controlling the strength of feature learning. As $\gamma$ increases, network evolution changes from "laz
Externí odkaz:
http://arxiv.org/abs/2410.04642
Convolutional Neural Networks (CNNs) excel in many visual tasks, but they tend to be sensitive to slight input perturbations that are imperceptible to the human eye, often resulting in task failures. Recent studies indicate that training CNNs with re
Externí odkaz:
http://arxiv.org/abs/2410.03952
We develop a solvable model of neural scaling laws beyond the kernel limit. Theoretical analysis of this model shows how performance scales with model size, training time, and the total amount of available data. We identify three scaling regimes corr
Externí odkaz:
http://arxiv.org/abs/2409.17858
Recent years have seen substantial advances in our understanding of high-dimensional ridge regression, but existing theories assume that training examples are independent. By leveraging recent techniques from random matrix theory and free probability
Externí odkaz:
http://arxiv.org/abs/2408.04607
We investigate the behavior of the Nadaraya-Watson kernel smoothing estimator in high dimensions using its relationship to the random energy model and to dense associative memories.
Comment: 9 pages, 3 figures
Comment: 9 pages, 3 figures
Externí odkaz:
http://arxiv.org/abs/2408.03769
Hyperbolic spaces have increasingly been recognized for their outstanding performance in handling data with inherent hierarchical structures compared to their Euclidean counterparts. However, learning in hyperbolic spaces poses significant challenges
Externí odkaz:
http://arxiv.org/abs/2405.17198
The vulnerability of neural network classifiers to adversarial attacks is a major obstacle to their deployment in safety-critical applications. Regularization of network parameters during training can be used to improve adversarial robustness and gen
Externí odkaz:
http://arxiv.org/abs/2405.17181
In this work, we analyze various scaling limits of the training dynamics of transformer models in the feature learning regime. We identify the set of parameterizations that admit well-defined infinite width and depth limits, allowing the attention la
Externí odkaz:
http://arxiv.org/abs/2405.15712
Autor:
Tong, William L., Pehlevan, Cengiz
In-context learning (ICL), the remarkable ability to solve a task from only input exemplars, is often assumed to be a unique hallmark of Transformer models. By examining commonly employed synthetic ICL tasks, we demonstrate that multi-layer perceptro
Externí odkaz:
http://arxiv.org/abs/2405.15618
Transformers have a remarkable ability to learn and execute tasks based on examples provided within the input itself, without explicit prior training. It has been argued that this capability, known as in-context learning (ICL), is a cornerstone of Tr
Externí odkaz:
http://arxiv.org/abs/2405.11751