A posteriori control densities: Imitation learning from partial observations

Autor: Tom Lefebvre, Guillaume Crevecoeur
Rok vydání: 2023
Předmět:
Zdroj: PATTERN RECOGNITION LETTERS
ISSN: 0167-8655
1872-7344
DOI: 10.1016/j.patrec.2023.04.001
Popis: This paper treats a special case of the Imitation from Observations (IfO) problem. IfO is a generalisation of Imitation Learning from state-only demonstrations. Our treatment of IfO considers the case of feature-only demonstrations. This means that the full state is inaccessible for inference, and imitation must occur on the basis of a limited set of features. We refer to this setting as Imitation from Partial Observations (IfPO). This scenario has the advantage of allowing to address a wider variety of demonstrations, as well as solving the problem of heteromorphic student and teacher. We set out for policy learning methods that extract an executable state-feedback policy, directly from those features, which in the literature is known as Behavioural Cloning. In this theoretical work, we formalize the rational inference model of the student decision maker, devoted to imitation, as a controlled Hidden Markov Model. The IfPO problem is then reformulated as a Maximum Likelihood Estimation problem and treated using Expectation-Maximization. We name the resulting fixed point iterations A Posteriori Control Densities. We compare the presented approach to existing methods in the field and identify potential directions for further development, such as an extension to unknown transition and emission models.
Databáze: OpenAIRE