Revisiting the problem of learning long-term dependencies in recurrent neural networks.

Autor: Johnston L; Department of Statistics, University of Wisconsin, Madison, WI, USA. Electronic address: ljohnston2@wisc.edu., Patel V; Department of Statistics, University of Wisconsin, Madison, WI, USA., Cui Y; Department of Statistics, University of Wisconsin, Madison, WI, USA., Balaprakash P; Oak Ridge National Laboratory, Oak Ridge, TN, USA.
Jazyk: angličtina
Zdroj: Neural networks : the official journal of the International Neural Network Society [Neural Netw] 2024 Nov 26; Vol. 183, pp. 106887. Date of Electronic Publication: 2024 Nov 26.
DOI: 10.1016/j.neunet.2024.106887
Abstrakt: Recurrent neural networks (RNNs) are an important class of models for learning sequential behavior. However, training RNNs to learn long-term dependencies is a tremendously difficult task, and this difficulty is widely attributed to the vanishing and exploding gradient (VEG) problem. Since it was first characterized 30 years ago, the belief that if VEG occurs during optimization then RNNs learn long-term dependencies poorly has become a central tenet in the RNN literature and has been steadily cited as motivation for a wide variety of research advancements. In this work, we revisit and interrogate this belief using a large factorial experiment where more than 40,000 RNNs were trained, and provide evidence contradicting this belief. Motivated by these findings, we re-examine the original discussion that analyzed latching behavior in RNNs by way of hyperbolic attractors, and ultimately demonstrate that these dynamics do not fully capture the learned characteristics of RNNs. Our findings suggest that these models are fully capable of learning dynamics that do not correspond to hyperbolic attractors, and that the choice of hyper-parameters, namely learning rate, has a substantial impact on the likelihood of whether an RNN will be able to learn long-term dependencies.
Competing Interests: Declaration of competing interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
(Copyright © 2024. Published by Elsevier Ltd.)
Databáze: MEDLINE