A Principle of Least Action for the Training of Neural Networks
Autor: | Ibrahhim Ayed, Patrick Gallinari, Emmanuel de Bézenac, Skander Karkar |
---|---|
Přispěvatelé: | karkar, skander, Criteo AI Lab, Criteo [Paris], Machine Learning and Information Access (MLIA), LIP6, Sorbonne Université (SU)-Centre National de la Recherche Scientifique (CNRS)-Sorbonne Université (SU)-Centre National de la Recherche Scientifique (CNRS) |
Rok vydání: | 2021 |
Předmět: |
FOS: Computer and information sciences
Computer Science - Machine Learning Dynamical systems theory Generalization Computer science Machine Learning (stat.ML) [MATH] Mathematics [math] 02 engineering and technology [INFO] Computer Science [cs] Dynamical system 01 natural sciences Machine Learning (cs.LG) Principle of least action Task (project management) 010104 statistics & probability Deep Learning Statistics - Machine Learning 0202 electrical engineering electronic engineering information engineering [INFO]Computer Science [cs] [MATH]Mathematics [math] 0101 mathematics Artificial neural network business.industry Optimal Transport Deep learning Dynamical Systems Statistical learning theory 020201 artificial intelligence & image processing Artificial intelligence business |
Zdroj: | Machine Learning and Knowledge Discovery in Databases ISBN: 9783030676605 ECML/PKDD (2) ECML PKDD ECML PKDD, Sep 2020, Ghent, Belgium |
Popis: | Neural networks have been achieving high generalization performance on many tasks despite being highly over-parameterized. Since classical statistical learning theory struggles to explain this behavior, much effort has recently been focused on uncovering the mechanisms behind it, in the hope of developing a more adequate theoretical framework and having a better control over the trained models. In this work, we adopt an alternate perspective, viewing the neural network as a dynamical system displacing input particles over time. We conduct a series of experiments and, by analyzing the network's behavior through its displacements, we show the presence of a low kinetic energy displacement bias in the transport map of the network, and link this bias with generalization performance. From this observation, we reformulate the learning problem as follows: finding neural networks which solve the task while transporting the data as efficiently as possible. This offers a novel formulation of the learning problem which allows us to provide regularity results for the solution network, based on Optimal Transport theory. From a practical viewpoint, this allows us to propose a new learning algorithm, which automatically adapts to the complexity of the given task, and leads to networks with a high generalization ability even in low data regimes. Comment: ECML PKDD 2020 |
Databáze: | OpenAIRE |
Externí odkaz: |