PAC-Bayesian Lifelong Learning For Multi-Armed Bandits

Autor:	Hamish Flynn, David Reeb, Melih Kandemir, Jan Peters
Jazyk:	angličtina
Rok vydání:	2022
Předmět:	FOS: Computer and information sciences Computer Science - Machine Learning Lifelong learning PAC-Bayesian Computer Networks and Communications Statistics - Machine Learning Machine Learning (stat.ML) Multi-armed bandits Machine Learning (cs.LG) Computer Science Applications Information Systems
Zdroj:	Flynn, H, Reeb, D, Kandemir, M & Peters, J 2022, ' PAC-Bayesian lifelong learning for multi-armed bandits ', Data Mining and Knowledge Discovery, vol. 36, pp. 841-876 . https://doi.org/10.1007/s10618-022-00825-4
DOI:	10.1007/s10618-022-00825-4
Popis:	We present a PAC-Bayesian analysis of lifelong learning. In the lifelong learning problem, a sequence of learning tasks is observed one-at-a-time, and the goal is to transfer information acquired from previous tasks to new learning tasks. We consider the case when each learning task is a multi-armed bandit problem. We derive lower bounds on the expected average reward that would be obtained if a given multi-armed bandit algorithm was run in a new task with a particular prior and for a set number of steps. We propose lifelong learning algorithms that use our new bounds as learning objectives. Our proposed algorithms are evaluated in several lifelong multi-armed bandit problems and are found to perform better than a baseline method that does not use generalisation bounds. 29 pages, 5 figures
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::12881e4939a36efd00c809cc7859d2ed http://arxiv.org/abs/2203.03303 Zobrazit plný text záznamu