Artificial Replay: A Meta-Algorithm for Harnessing Historical Data in Bandits

Autor: Banerjee, Siddhartha, Sinclair, Sean R., Tambe, Milind, Xu, Lily, Yu, Christina Lee
Rok vydání: 2022
Předmět:
Druh dokumentu: Working Paper
Popis: Most real-world deployments of bandit algorithms exist somewhere in between the offline and online set-up, where some historical data is available upfront and additional data is collected dynamically online. How best to incorporate historical data to "warm start" bandit algorithms is an open question: naively initializing reward estimates using all historical samples can suffer from spurious data and imbalanced data coverage, leading to computation and storage issues-particularly for continuous action spaces. To address these challenges, we propose Artificial-Replay, a meta-algorithm for incorporating historical data into any arbitrary base bandit algorithm. We show that Artificial-Replay uses only a fraction of the historical data compared to a full warm-start approach, while still achieving identical regret for base algorithms that satisfy independence of irrelevant data (IIData), a novel and broadly applicable property that we introduce. We complement these theoretical results with experiments on (i) K-armed bandits and (ii) continuous combinatorial bandits, on which we model green security domains using real poaching data. Our results show the practical benefits of Artificial-Replayin reducing computation and space complexity, including for base algorithms that do not satisfy IIData.
Comment: 50 pages (21 pages main paper), 9 figures
Databáze: arXiv