Title: A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning
Stephane Ross, Geoffrey Gordon, and J. Andrew (Drew) Bagnell
Proceedings of the 14th International Conference on Artificial Intelligence and Statistics (AISTATS), April, 2011.
Abstracts:
Sequential prediction problems such as imitation learning, where future observations depend on previous predictions (actions), violate the common i.i.d. assumptions made in statistical learning. ... In this paper, we propose a new iterative algorithm, which trains a stationary deterministic policy, that can be seen as a no regret algorithm in an online learning setting. We show that any such no regret algorithm, combined with additional reduction assumptions, must find a policy with good performance under the distribution of observations it induces in such sequential settings.
Link
No comments:
Post a Comment