Stochastic Online Shortest Path Routing: The Value of Feedback

Autor:	Richard Combes, Alexandre Proutiere, Mohammad Sadegh Talebi, Mikael Johansson, Zhenhua Zou
Přispěvatelé:	Royal Institute of Technology [Stockholm] (KTH ), Ericsson Research, Laboratoire des signaux et systèmes (L2S), Université Paris-Sud - Paris 11 (UP11)-CentraleSupélec-Centre National de la Recherche Scientifique (CNRS), European Project: 308267,EC:FP7:ERC,ERC-2012-StG_20111012,FSA(2012)
Rok vydání:	2018
Předmět:	FOS: Computer and information sciences Independent and identically distributed random variables Mathematical optimization Optimization problem Computational complexity theory Computer science 02 engineering and technology 010501 environmental sciences 01 natural sciences Machine Learning (cs.LG) Computer Science - Networking and Internet Architecture [INFO.INFO-NI]Computer Science [cs]/Networking and Internet Architecture [cs.NI] online combinatorial optimization [INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG] stochastic multi-armed bandits FOS: Mathematics 0202 electrical engineering electronic engineering information engineering Electrical and Electronic Engineering Mathematics - Optimization and Control 0105 earth and related environmental sciences Networking and Internet Architecture (cs.NI) Shortest path routing Network packet 020206 networking & telecommunications Regret Computer Science Applications Computer Science - Learning Optimization and Control (math.OC) Control and Systems Engineering Path (graph theory) Shortest path problem Algorithm design [MATH.MATH-OC]Mathematics [math]/Optimization and Control [math.OC] Routing (electronic design automation)
Zdroj:	IEEE Transactions on Automatic Control IEEE Transactions on Automatic Control, Institute of Electrical and Electronics Engineers, 2017, 63 (4), pp.915-930. ⟨10.1109/TAC.2017.2747409⟩ IEEE Transactions on Automatic Control, 2017, 63 (4), pp.915-930. ⟨10.1109/TAC.2017.2747409⟩
ISSN:	1558-2523 0018-9286
DOI:	10.1109/tac.2017.2747409
Popis:	This paper studies online shortest path routing over multi-hop networks. Link costs or delays are time-varying and modeled by independent and identically distributed random processes, whose parameters are initially unknown. The parameters, and hence the optimal path, can only be estimated by routing packets through the network and observing the realized delays. Our aim is to find a routing policy that minimizes the regret (the cumulative difference of expected delay) between the path chosen by the policy and the unknown optimal path. We formulate the problem as a combinatorial bandit optimization problem and consider several scenarios that differ in where routing decisions are made and in the information available when making the decisions. For each scenario, we derive a tight asymptotic lower bound on the regret that has to be satisfied by any online routing policy. These bounds help us to understand the performance improvements we can expect when (i) taking routing decisions at each hop rather than at the source only, and (ii) observing per-link delays rather than end-to-end path delays. In particular, we show that (i) is of no use while (ii) can have a spectacular impact. Three algorithms, with a trade-off between computational complexity and performance, are proposed. The regret upper bounds of these algorithms improve over those of the existing algorithms, and they significantly outperform state-of-the-art algorithms in numerical experiments. 18 pages
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::ed0095da2b31f5c7301272aca6f734ab https://doi.org/10.1109/tac.2017.2747409 Zobrazit plný text záznamu