Achieving generalization of deep learning models in a quick way by adapting T-HTR learning rate scheduler

Autor:	J Senthil Kumar, Y. Suresh, D Vidyabharathi, Mohanraj
Rok vydání:	2021
Předmět:	Hyperparameter Artificial neural network Computer science Generalization business.industry Deep learning Hyperbolic function Mode (statistics) Management Science and Operations Research Library and Information Sciences Computer Science Applications Set (abstract data type) Hardware and Architecture Benchmark (computing) Artificial intelligence business Algorithm
Zdroj:	Personal and Ubiquitous Computing. 27:1335-1353
ISSN:	1617-4917 1617-4909
DOI:	10.1007/s00779-021-01587-4
Popis:	Deep neural network training involves multiplfe hyperparameters which have an impact on the prediction or classification accuracy of the model. Among all the hyperparameters, learning rate plays a key role in training the network to achieve results effectively. Several researchers have attempted to design a learning rate scheduler to find an optimal learning rate. In this paper, performance of the existing state-of-the-art learning rate schedulers, viz. HTD (hyperbolic tangent decay) and CLR (cyclical learning rate) schedulers by using them with LSTM (long short-term memory) and BiLSTM (bidirectional long short-term memory) architectures is investigated. The existing learning rate schedulers have not achieved the best prediction accuracy when it is tested on three benchmark datasets such as 20Newsgroup, Reuters Newswire, and IMDB. To address the issue, HTR (toggle between hyperbolic tangent decay and triangular mode with restarts) learning rate scheduler is proposed and examined in this research. The proposed scheduler flips the learning rate between the epochs. When training is progressed through each epoch, a new learning rate is calculated based on the difference between the gradient values of the previous two iterations. Apart from the learning rate, the step width value also has certain impact in the accuracy of the model. When the step width value is set to minimum, the accuracy is enhanced. Furthermore, the network convergence takes place in minimum iterations with better accuracy. The overall performance of the proposed system is comparatively improved and shown in our experimental results.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::5c55a9c30abe2fa7302d1aea5525b193 https://doi.org/10.1007/s00779-021-01587-4 Zobrazit plný text záznamu Full text from SpringerLink