Monday, June 27, 2011

Lab meeting June 29th (Jim): A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning

Title: A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning

Stephane Ross, Geoffrey Gordon, and J. Andrew (Drew) Bagnell

Proceedings of the 14th International Conference on Artificial Intelligence and Statistics (AISTATS)
, April, 2011.


Abstracts:
Sequential prediction problems such as imitation learning, where future observations depend on previous predictions (actions), violate the common i.i.d. assumptions made in statistical learning. ... In this paper, we propose a new iterative algorithm, which trains a stationary deterministic policy, that can be seen as a no regret algorithm in an online learning setting. We show that any such no regret algorithm, combined with additional reduction assumptions, must find a policy with good performance under the distribution of observations it induces in such sequential settings.

Link

No comments: