Zobrazeno 1 - 10
of 196
pro vyhledávání: '"Saxe, Andrew"'
A number of machine learning models have been proposed with the goal of achieving systematic generalization: the ability to reason about new situations by combining aspects of previous experiences. These models leverage compositional architectures wh
Externí odkaz:
http://arxiv.org/abs/2409.14981
Autor:
Dominé, Clémentine C. J., Anguita, Nicolas, Proca, Alexandra M., Braun, Lukas, Kunin, Daniel, Mediano, Pedro A. M., Saxe, Andrew M.
Biological and artificial neural networks develop internal representations that enable them to perform complex tasks. In artificial networks, the effectiveness of these models relies on their ability to build task specific representation, a process i
Externí odkaz:
http://arxiv.org/abs/2409.14623
Deep neural networks learn increasingly complex functions over the course of training. Here, we show both empirically and theoretically that learning of the target function is preceded by an early phase in which networks learn the optimal constant so
Externí odkaz:
http://arxiv.org/abs/2406.17467
We investigate the expressivity and learning dynamics of bias-free ReLU networks. We firstly show that two-layer bias-free ReLU networks have limited expressivity: the only odd function two-layer bias-free ReLU networks can express is a linear one. W
Externí odkaz:
http://arxiv.org/abs/2406.12615
Get rich quick: exact solutions reveal how unbalanced initializations promote rapid feature learning
Autor:
Kunin, Daniel, Raventós, Allan, Dominé, Clémentine, Chen, Feng, Klindt, David, Saxe, Andrew, Ganguli, Surya
While the impressive performance of modern neural networks is often attributed to their capacity to efficiently extract task-relevant features from data, the mechanisms underlying this rich feature learning regime remain elusive, with much of our the
Externí odkaz:
http://arxiv.org/abs/2406.06158
A wide range of empirical and theoretical works have shown that overparameterisation can amplify the performance of neural networks. According to the lottery ticket hypothesis, overparameterised networks have an increased chance of containing a sub-n
Externí odkaz:
http://arxiv.org/abs/2406.01589
In-context learning is a powerful emergent ability in transformer models. Prior work in mechanistic interpretability has identified a circuit element that may be critical for in-context learning -- the induction head (IH), which performs a match-and-
Externí odkaz:
http://arxiv.org/abs/2404.07129
Diverse studies in systems neuroscience begin with extended periods of curriculum training known as `shaping' procedures. These involve progressively studying component parts of more complex tasks, and can make the difference between learning a task
Externí odkaz:
http://arxiv.org/abs/2402.18361
Autor:
van Rossem, Loek, Saxe, Andrew M.
Deep neural networks come in many sizes and architectures. The choice of architecture, in conjunction with the dataset and learning algorithm, is commonly understood to affect the learned neural representations. Yet, recent results have shown that di
Externí odkaz:
http://arxiv.org/abs/2402.09142
Using multiple input streams simultaneously to train multimodal neural networks is intuitively advantageous but practically challenging. A key challenge is unimodal bias, where a network overly relies on one modality and ignores others during joint t
Externí odkaz:
http://arxiv.org/abs/2312.00935