Variational Learning is Effective for Large Deep Networks

Autor:	Shen, Yuesong, Daheim, Nico, Cong, Bai, Nickl, Peter, Marconi, Gian Maria, Bazan, Clement, Yokota, Rio, Gurevych, Iryna, Cremers, Daniel, Khan, Mohammad Emtiyaz, Möllenhoff, Thomas
Rok vydání:	2024
Předmět:	Computer Science - Machine Learning Computer Science - Artificial Intelligence Computer Science - Computation and Language Mathematics - Optimization and Control Statistics - Machine Learning
Druh dokumentu:	Working Paper
Popis:	We give extensive empirical evidence against the common belief that variational learning is ineffective for large neural networks. We show that an optimizer called Improved Variational Online Newton (IVON) consistently matches or outperforms Adam for training large networks such as GPT-2 and ResNets from scratch. IVON's computational costs are nearly identical to Adam but its predictive uncertainty is better. We show several new use cases of IVON where we improve finetuning and model merging in Large Language Models, accurately predict generalization error, and faithfully estimate sensitivity to data. We find overwhelming evidence that variational learning is effective. Comment: Published at International Conference on Machine Learning (ICML), 2024. The first two authors contributed equally. Code is available here: https://github.com/team-approx-bayes/ivon
Databáze:	arXiv
Externí odkaz:	http://arxiv.org/abs/2402.17641 Zobrazit plný text záznamu View this record from Arxiv