ICML 2000
Title: Algorithms for Inverse Reinforcement Learning
Authors: Andrew Y. Ng and Stuart Russell
Abstract:
This paper addresses the problem of inverse reinforcement learning (IRL) in Markov decision processes, that is, the problem of extracting a reward function given observed, optimal behaviour. IRL may be useful for apprenticeship learning to acquire skilled behaviour, and for ascertaining the reward function being optimized by a natural system. We rst characterize the set of all reward functions for which a given policy is optimal. We then derive three algorithms for IRL. The rst two deal with the case where the entire policy is known; we handle tabulated reward functions on a nite state space and linear functional approximation of the reward function over a potentially in-nite state space. The third algorithm deals with the more realistic case in which the policy is known only through a nite set of observed trajectories. In all cases, a key issue is degeneracy|the existence of a large set of reward functions for which the observed policy is optimal. To remove degeneracy, we suggest some natural heuristics that attempt to pick a reward function that maximally dierentiates the observed policy from other, suboptimal policies. This results in an eciently solvable linear programming formulation of the IRL problem. We demonstrate our algorithms on simple discrete / nite and continuous /in nite state problems.
No comments:
Post a Comment