Výsledky vyhledávání - "Poggio, Tomaso"

Report

On Generalization Bounds for Neural Networks with Low Rank Layers

Autor: Pinto, Andrea, Rangamani, Akshay, Poggio, Tomaso

While previous optimization results have suggested that deep neural networks tend to favour low-rank weight matrices, the implications of this inductive bias on generalization bounds remain underexplored. In this paper, we apply Maurer's chain rule f

Externí odkaz: http://arxiv.org/abs/2411.13733

Zobrazit plný text záznamu

Report

Training the Untrainable: Introducing Inductive Bias via Representational Alignment

Autor: Subramaniam, Vighnesh, Mayo, David, Conwell, Colin, Poggio, Tomaso, Katz, Boris, Cheung, Brian, Barbu, Andrei

We demonstrate that architectures which traditionally are considered to be ill-suited for a task can be trained using inductive biases from another architecture. Networks are considered untrainable when they overfit, underfit, or converge to poor res

Externí odkaz: http://arxiv.org/abs/2410.20035

Zobrazit plný text záznamu

Report

Formation of Representations in Neural Networks

Autor: Ziyin, Liu, Chuang, Isaac, Galanti, Tomer, Poggio, Tomaso

Understanding neural representations will help open the black box of neural networks and advance our scientific understanding of modern AI systems. However, how complex, structured, and transferable representations emerge in modern neural networks ha

Externí odkaz: http://arxiv.org/abs/2410.03006

Zobrazit plný text záznamu

Report

On the Power of Decision Trees in Auto-Regressive Language Modeling

Autor: Gan, Yulu, Galanti, Tomer, Poggio, Tomaso, Malach, Eran

Originally proposed for handling time series data, Auto-regressive Decision Trees (ARDTs) have not yet been explored for language modeling. This paper delves into both the theoretical and practical applications of ARDTs in this new context. We theore

Externí odkaz: http://arxiv.org/abs/2409.19150

Zobrazit plný text záznamu

Report

How Neural Networks Learn the Support is an Implicit Regularization Effect of SGD

Autor: Beneventano, Pierfrancesco, Pinto, Andrea, Poggio, Tomaso

We investigate the ability of deep neural networks to identify the support of the target function. Our findings reveal that mini-batch SGD effectively learns the support in the first layer of the network by shrinking to zero the weights associated wi

Externí odkaz: http://arxiv.org/abs/2406.11110

Zobrazit plný text záznamu

Report

How to guess a gradient

Autor: Singhal, Utkarsh, Cheung, Brian, Chandra, Kartik, Ragan-Kelley, Jonathan, Tenenbaum, Joshua B., Poggio, Tomaso A., Yu, Stella X.

How much can you say about the gradient of a neural network without computing a loss or knowing the label? This may sound like a strange question: surely the answer is "very little." However, in this paper, we show that gradients are more structured

Externí odkaz: http://arxiv.org/abs/2312.04709

Zobrazit plný text záznamu

Report

System identification of neural systems: If we got it right, would we know?

Autor: Han, Yena, Poggio, Tomaso, Cheung, Brian

Artificial neural networks are being proposed as models of parts of the brain. The networks are compared to recordings of biological neurons, and good performance in reproducing neural responses is considered to support the model's validity. A key qu

Externí odkaz: http://arxiv.org/abs/2302.06677

Zobrazit plný text záznamu

Report

Norm-based Generalization Bounds for Compositionally Sparse Neural Networks

Autor: Galanti, Tomer, Xu, Mengjia, Galanti, Liane, Poggio, Tomaso

In this paper, we investigate the Rademacher complexity of deep sparse neural networks, where each neuron receives a small number of inputs. We prove generalization bounds for multilayered sparse ReLU neural networks, including convolutional neural n

Externí odkaz: http://arxiv.org/abs/2301.12033

Zobrazit plný text záznamu

Report

Iterative regularization in classification via hinge loss diagonal descent

Autor: Apidopoulos, Vassilis, Poggio, Tomaso, Rosasco, Lorenzo, Villa, Silvia

Iterative regularization is a classic idea in regularization theory, that has recently become popular in machine learning. On the one hand, it allows to design efficient algorithms controlling at the same time numerical and statistical accuracy. On t

Externí odkaz: http://arxiv.org/abs/2212.12675

Zobrazit plný text záznamu

Report

SGD and Weight Decay Secretly Minimize the Rank of Your Neural Network

Autor: Galanti, Tomer, Siegel, Zachary S., Gupte, Aparna, Poggio, Tomaso

We investigate the inherent bias of Stochastic Gradient Descent (SGD) toward learning low-rank weight matrices during the training of deep neural networks. Our results demonstrate that training with mini-batch SGD and weight decay induces a bias towa

Externí odkaz: http://arxiv.org/abs/2206.05794

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání