Thursday, November 22, 2007

CMU RI Thesis Proposal Nov 27, 2007 Peer-Advising: An Approach for Policy Improvement when Learning by Demonstration

Brenna Argall

Abstract:
The presence of robots within the world is becoming ever more prevalent. Whether exploration rovers in space or recreational robots for the home, successful autonomous robot operation requires a motion control algorithm, or policy, which maps observations of the world to actions available on the robot. Policy development is generally a complex process restricted to experts within the field. However, as robots become more commonplace, the need for policy development which is straightforward and feasible for non-experts will increase. Furthermore, as robots co-exist with people, humans and robots will necessarily share experiences. With this thesis, we explore an approach to policy development which exploits information from shared human-robot experience. We introduce the concept of policy development through peer-advice: to improve its policy, the robot learner takes advice from a human peer. We characterize a peer as able to execute the robot motion task herself, and to evaluate robot performance according to the measures used to evaluate her own executions.

We develop peer-advising within a Learning by Demonstration (LbD) framework. In typical LbD systems, a teacher provides demonstration data, and the learner estimates the underlying function mapping observations to actions within this dataset. With our approach, we extend this framework to then enter an explicit policy improvement phase. We identify two basic conduits for policy improvement within this setup: to modify the demonstration dataset, and to change the approximating function directly. The former approach we refer to as data-advising, and the latter as function-advising. We have developed a preliminary algorithm which extends the LbD framework along both of these conduits.

This algorithm has been validated empirically both within simulation and using a Segway RMP robot. Peer-advice has proven effective towards control policy modification, and to improve policy performance. Within classical LbD learner performance is limited by the demonstrator’s abilities; however, through advice learner performance has been shown to extend and even exceed capabilities of the demonstration set. In our proposed work, we will further develop and explore peer-advice as an effective tool for LbD policy improvement. Our primary focus will be the development of novel techniques for both function-advising and data-advising. This proposed work will be validated on a Segway RMP robot.

Link

No comments: