Monday, April 14, 2008

[Robotics Institute Thesis Proposal ] Structured Prediction Techniques for Imitation Learning

Abstract:
Programming robots is hard. We can often easily demonstrate the behavior we desire, but mapping that intuition into the space of parameters governing the robot's decisions is difficult, time consuming, and ultimately expensive. Machine learning promises “programming by demonstration” paradigms to develop high-performance robotic systems. Unfortunately, many “classical” machine learning techniques, such as decision trees, neural networks, and support vector machines, do not fit the needs of modern robotics systems which are often built around sophisticated planning algorithms that efficiently reason about the future. Consequently, these learning systems often fall short of producing high-quality robot performance.

Rather than ignoring planning algorithms in lieu of pure learning systems, the algorithms I discuss in this proposal embrace optimal cost planning algorithms as a central component of robot behavior. I propose here a set of simple gradient-based algorithms for training cost-based planners from examples of decision sequences provided by an expert. These algorithms are simple, intuitive, easy to implement, and they enjoy both state-of-the-art empirical performance and strong theoretical guarantees. Collectively, we call our framework Maximum Margin Planning (MMP).

Our algorithms fall under the category of imitation learning. In this proposal, I first briefly survey the history of imitation learning and map the progression of algorithms that led to the development of MMP. I then discuss the MMP collection of algorithms at many levels of detail, starting from an intuitive and implementational perspective, and then proceeding to a more formal mathematical derivation. Throughout the discussion I demonstrate the techniques on a wide array of problems found in robotics, from navigational planning and heuristic learning to footstep prediction and grasp planning. Toward the end of the document I outline a set of open problems in imitation learning not solved by MMP and touch on recent progress we have made toward solving them.

Link

No comments: