On the Refinement of Conflict History Search Through Multi-Armed Bandit
Autor: | Djamal Habet, Mohamed Sami Cherif, Cyril Terrioux |
---|---|
Přispěvatelé: | COntraintes, ALgorithmes et Applications (COALA), Laboratoire d'Informatique et Systèmes (LIS), Aix Marseille Université (AMU)-Université de Toulon (UTLN)-Centre National de la Recherche Scientifique (CNRS)-Aix Marseille Université (AMU)-Université de Toulon (UTLN)-Centre National de la Recherche Scientifique (CNRS), ANR-16-CE40-0028,DE-MO-GRAPH,Décomposition de Modèles Graphiques(2016) |
Jazyk: | angličtina |
Rok vydání: | 2020 |
Předmět: |
Mathematical optimization
Computer science Backtracking Heuristic Conflict History Search 020207 software engineering 02 engineering and technology Multi-armed bandit [INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI] Multi-Armed Bandits Search algorithm 0202 electrical engineering electronic engineering information engineering Constraint programming Reinforcement learning 020201 artificial intelligence & image processing Heuristics Constraint satisfaction problem |
Zdroj: | IEEE 32nd International Conference on Tools with Artificial Intelligence (ICTAI) IEEE 32nd International Conference on Tools with Artificial Intelligence (ICTAI), Nov 2020, Baltimore, United States. pp.264-271, ⟨10.1109/ICTAI50040.2020.00050⟩ ICTAI |
DOI: | 10.1109/ICTAI50040.2020.00050⟩ |
Popis: | International audience; Reinforcement learning has shown its relevance in designing search heuristics for backtracking algorithms dedicated to solving decision problems under constraints. Recently, an efficient heuristic, called Conflict History Search (CHS), based on the history of search failures was introduced for the Constraint Satisfaction Problem (CSP). The Exponential Recency Weighted Average (ERWA) is used to estimate the hardness of constraints and CHS favors the variables that often appear in recent failures. The step parameter is important in CHS since it controls the estimation of the hardness of constraints and its refinement may lead to notable improvements. The current research aims to achieve this objective. Indeed, a Multi-Armed Bandits (MAB) framework can select an appropriate value of this parameter during the restarts performed by the search algorithm. Each arm represents a CHS with a given value for the step parameter and it is rewarded by its ability to improve the search. A training phase is introduced earlier in the search to help MAB choose a relevant arm. The experimental evaluation shows that this approach leads to significant improvements regarding CHS and other state-ofthe-art heuristics. |
Databáze: | OpenAIRE |
Externí odkaz: |