Properly Acting under Partial Observability with Action Feasibility Constraints
Autor: | Florent Teichteil-Königsbuch, Caroline Ponzoni Carvalho Chanel |
---|---|
Přispěvatelé: | Institut Supérieur de l'Aéronautique et de l'Espace - ISAE-SUPAERO (FRANCE), Office National d'Etudes et Recherches Aérospatiales - ONERA (FRANCE) |
Jazyk: | angličtina |
Rok vydání: | 2013 |
Předmět: |
0209 industrial biotechnology
Mathematical optimization Property (programming) Partially observable Markov decision processes Partially observable Markov decision process Action feasibility constraints 02 engineering and technology ComputingMethodologies_ARTIFICIALINTELLIGENCE Linear subspace Safe robotics Sequential decision-making Set (abstract data type) Action preconditions 020901 industrial engineering & automation Action (philosophy) 0202 electrical engineering electronic engineering information engineering State space 020201 artificial intelligence & image processing State (computer science) Observability Automatique / Robotique Mathematics |
Zdroj: | Advanced Information Systems Engineering ISBN: 9783642387081 ECML/PKDD (1) |
Popis: | We introduce Action-Constrained Partially Observable Markov Decision Process (AC-POMDP), which arose from studying critical robotic applications with damaging actions. AC-POMDPs restrict the optimized policy to only apply feasible actions: each action is feasible in a subset of the state space, and the agent can observe the set of applicable actions in the current hidden state, in addition to standard observations. We present optimality equations for AC-POMDPs, which imply to operate on alpha-vectors defined over many different belief subspaces. We propose an algorithm named PreCondition Value Iteration (PCVI), which fully exploits this specific property of AC-POMDPs about alpha-vectors. We also designed a relaxed version of PCVI whose complexity is exponentially smaller than PCVI. Experimental results on POMDP robotic benchmarks with action feasibility constraints exhibit the benefits of explicitly exploiting the semantic richness of action- easibility observations in AC-POMDPs over equivalent but unstructured POMDPs. |
Databáze: | OpenAIRE |
Externí odkaz: |