Výsledky vyhledávání - "Saxe, Andrew"

Report

Autor: Jarvis, Devon, Klein, Richard, Rosman, Benjamin, Saxe, Andrew M.

A number of machine learning models have been proposed with the goal of achieving systematic generalization: the ability to reason about new situations by combining aspects of previous experiences. These models leverage compositional architectures wh

Externí odkaz: http://arxiv.org/abs/2409.14981

Zobrazit plný text záznamu

Report

From Lazy to Rich: Exact Learning Dynamics in Deep Linear Networks

Autor: Dominé, Clémentine C. J., Anguita, Nicolas, Proca, Alexandra M., Braun, Lukas, Kunin, Daniel, Mediano, Pedro A. M., Saxe, Andrew M.

Biological and artificial neural networks develop internal representations that enable them to perform complex tasks. In artificial networks, the effectiveness of these models relies on their ability to build task specific representation, a process i

Externí odkaz: http://arxiv.org/abs/2409.14623

Zobrazit plný text záznamu

Report

Early learning of the optimal constant solution in neural networks and humans

Autor: Rubruck, Jirko, Bauer, Jan P., Saxe, Andrew, Summerfield, Christopher

Deep neural networks learn increasingly complex functions over the course of training. Here, we show both empirically and theoretically that learning of the target function is preceded by an early phase in which networks learn the optimal constant so

Externí odkaz: http://arxiv.org/abs/2406.17467

Zobrazit plný text záznamu

Report

When Are Bias-Free ReLU Networks Like Linear Networks?

Autor: Zhang, Yedi, Saxe, Andrew, Latham, Peter E.

We investigate the expressivity and learning dynamics of bias-free ReLU networks. We firstly show that two-layer bias-free ReLU networks have limited expressivity: the only odd function two-layer bias-free ReLU networks can express is a linear one. W

Externí odkaz: http://arxiv.org/abs/2406.12615

Zobrazit plný text záznamu

Report

Get rich quick: exact solutions reveal how unbalanced initializations promote rapid feature learning

Autor: Kunin, Daniel, Raventós, Allan, Dominé, Clémentine, Chen, Feng, Klindt, David, Saxe, Andrew, Ganguli, Surya

While the impressive performance of modern neural networks is often attributed to their capacity to efficiently extract task-relevant features from data, the mechanisms underlying this rich feature learning regime remain elusive, with much of our the

Externí odkaz: http://arxiv.org/abs/2406.06158

Zobrazit plný text záznamu

Report

Tilting the Odds at the Lottery: the Interplay of Overparameterisation and Curricula in Neural Networks

Autor: Mannelli, Stefano Sarao, Ivashinka, Yaraslau, Saxe, Andrew, Saglietti, Luca

A wide range of empirical and theoretical works have shown that overparameterisation can amplify the performance of neural networks. According to the lottery ticket hypothesis, overparameterised networks have an increased chance of containing a sub-n

Externí odkaz: http://arxiv.org/abs/2406.01589

Zobrazit plný text záznamu

Report

What needs to go right for an induction head? A mechanistic study of in-context learning circuits and their formation

Autor: Singh, Aaditya K., Moskovitz, Ted, Hill, Felix, Chan, Stephanie C. Y., Saxe, Andrew M.

In-context learning is a powerful emergent ability in transformer models. Prior work in mechanistic interpretability has identified a circuit element that may be critical for in-context learning -- the induction head (IH), which performs a match-and-

Externí odkaz: http://arxiv.org/abs/2404.07129

Zobrazit plný text záznamu

Report

Why Do Animals Need Shaping? A Theory of Task Composition and Curriculum Learning

Autor: Lee, Jin Hwa, Mannelli, Stefano Sarao, Saxe, Andrew

Diverse studies in systems neuroscience begin with extended periods of curriculum training known as `shaping' procedures. These involve progressively studying component parts of more complex tasks, and can make the difference between learning a task

Externí odkaz: http://arxiv.org/abs/2402.18361

Zobrazit plný text záznamu

Report

When Representations Align: Universality in Representation Learning Dynamics

Autor: van Rossem, Loek, Saxe, Andrew M.

Deep neural networks come in many sizes and architectures. The choice of architecture, in conjunction with the dataset and learning algorithm, is commonly understood to affect the learned neural representations. Yet, recent results have shown that di

Externí odkaz: http://arxiv.org/abs/2402.09142

Zobrazit plný text záznamu

Report

Understanding Unimodal Bias in Multimodal Deep Linear Networks

Autor: Zhang, Yedi, Latham, Peter E., Saxe, Andrew

Using multiple input streams simultaneously to train multimodal neural networks is intuitively advantageous but practically challenging. A key challenge is unimodal bias, where a network overly relies on one modality and ignores others during joint t

Externí odkaz: http://arxiv.org/abs/2312.00935

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání