Artificial Replay: A Meta-Algorithm for Harnessing Historical Data in Bandits
Autor: | Banerjee, Siddhartha, Sinclair, Sean R., Tambe, Milind, Xu, Lily, Yu, Christina Lee |
---|---|
Rok vydání: | 2022 |
Předmět: | |
Druh dokumentu: | Working Paper |
Popis: | Most real-world deployments of bandit algorithms exist somewhere in between the offline and online set-up, where some historical data is available upfront and additional data is collected dynamically online. How best to incorporate historical data to "warm start" bandit algorithms is an open question: naively initializing reward estimates using all historical samples can suffer from spurious data and imbalanced data coverage, leading to computation and storage issues-particularly for continuous action spaces. To address these challenges, we propose Artificial-Replay, a meta-algorithm for incorporating historical data into any arbitrary base bandit algorithm. We show that Artificial-Replay uses only a fraction of the historical data compared to a full warm-start approach, while still achieving identical regret for base algorithms that satisfy independence of irrelevant data (IIData), a novel and broadly applicable property that we introduce. We complement these theoretical results with experiments on (i) K-armed bandits and (ii) continuous combinatorial bandits, on which we model green security domains using real poaching data. Our results show the practical benefits of Artificial-Replayin reducing computation and space complexity, including for base algorithms that do not satisfy IIData. Comment: 50 pages (21 pages main paper), 9 figures |
Databáze: | arXiv |
Externí odkaz: |