Zobrazeno 1 - 10
of 571
pro vyhledávání: '"Poggio, Tomaso"'
While previous optimization results have suggested that deep neural networks tend to favour low-rank weight matrices, the implications of this inductive bias on generalization bounds remain underexplored. In this paper, we apply Maurer's chain rule f
Externí odkaz:
http://arxiv.org/abs/2411.13733
Autor:
Subramaniam, Vighnesh, Mayo, David, Conwell, Colin, Poggio, Tomaso, Katz, Boris, Cheung, Brian, Barbu, Andrei
We demonstrate that architectures which traditionally are considered to be ill-suited for a task can be trained using inductive biases from another architecture. Networks are considered untrainable when they overfit, underfit, or converge to poor res
Externí odkaz:
http://arxiv.org/abs/2410.20035
Understanding neural representations will help open the black box of neural networks and advance our scientific understanding of modern AI systems. However, how complex, structured, and transferable representations emerge in modern neural networks ha
Externí odkaz:
http://arxiv.org/abs/2410.03006
Originally proposed for handling time series data, Auto-regressive Decision Trees (ARDTs) have not yet been explored for language modeling. This paper delves into both the theoretical and practical applications of ARDTs in this new context. We theore
Externí odkaz:
http://arxiv.org/abs/2409.19150
We investigate the ability of deep neural networks to identify the support of the target function. Our findings reveal that mini-batch SGD effectively learns the support in the first layer of the network by shrinking to zero the weights associated wi
Externí odkaz:
http://arxiv.org/abs/2406.11110
Autor:
Singhal, Utkarsh, Cheung, Brian, Chandra, Kartik, Ragan-Kelley, Jonathan, Tenenbaum, Joshua B., Poggio, Tomaso A., Yu, Stella X.
How much can you say about the gradient of a neural network without computing a loss or knowing the label? This may sound like a strange question: surely the answer is "very little." However, in this paper, we show that gradients are more structured
Externí odkaz:
http://arxiv.org/abs/2312.04709
Artificial neural networks are being proposed as models of parts of the brain. The networks are compared to recordings of biological neurons, and good performance in reproducing neural responses is considered to support the model's validity. A key qu
Externí odkaz:
http://arxiv.org/abs/2302.06677
In this paper, we investigate the Rademacher complexity of deep sparse neural networks, where each neuron receives a small number of inputs. We prove generalization bounds for multilayered sparse ReLU neural networks, including convolutional neural n
Externí odkaz:
http://arxiv.org/abs/2301.12033
Iterative regularization is a classic idea in regularization theory, that has recently become popular in machine learning. On the one hand, it allows to design efficient algorithms controlling at the same time numerical and statistical accuracy. On t
Externí odkaz:
http://arxiv.org/abs/2212.12675
We investigate the inherent bias of Stochastic Gradient Descent (SGD) toward learning low-rank weight matrices during the training of deep neural networks. Our results demonstrate that training with mini-batch SGD and weight decay induces a bias towa
Externí odkaz:
http://arxiv.org/abs/2206.05794