Tuesday, October 14, 2008

CMU RI Thesis Proposal: Pretty Observable Markov Decision Processes: Exploiting Approximate Structure for Efficient Planning under Uncertainty

Title: Pretty Observable Markov Decision Processes: Exploiting Approximate Structure for Efficient Planning under Uncertainty

Nicholas Armstrong-Crews
Robotics Institute
Carnegie Mellon University


NSH 1507

10:00 AM
20 Oct 2008

Abstract:
Planning under uncertainty is a challenging task. POMDP models have become a popular method for describing such domains. Unfortunately, solving a POMDP to find the optimal policy is computationally intractable, in general. Recent advances in solving POMDPs include finding near-optimal policies and exploiting structured representations of the problems. We believe that using these two tools together synergistically, we can tame the complexity of many POMDPs. In this thesis, we propose to further advance these approaches by analyzing new types of structure and new approximation techniques, as well as the methods combining the intersection of the two.

Some of the research we have done to lay the groundwork for this thesis falls into these categories, with promising results. We introduced the Oracular POMDP framework, which takes advantage of an MDP-like structure by allowing direct observation of the state as a (costly) action by the agent, but otherwise the agent receives no information from the environment and in between invocations of this “oracle'' action the agent is again afflicted by uncertainty. We have given an anytime algorithm for solving Oracular POMDPs which we've proven is efficient (poly-time) in all but the number of actions. At any iteration of the anytime algorithm, we have a (provably) near-optimal policy, which we have achieved efficiently by exploiting the structured observation function.

Another vein of our past work addressing solving general POMDPs by approximating them as finite state MDPs. It is a well-known result that POMDPs are equivalent to continuous MDPs whose state space is the belief simplex (the probability distribution over possible hidden states). We sample a finite number of these beliefs to create a finite-state MDP that approximates the original POMDP. We then solve this MDP for an optimal policy, improve our sample of belief states with this policy so that it better approximates the POMDP, and continue in this fashion.

These prior works exhibit an important common methodology: anytime algorithms that give near-optimal policies at every iteration, and in the limit converge to the optimal policy. This property is paramount for tackling problems with approximate structure. We can focus early iterations on the structured portion of the problem, which we can solve quickly; and later iterations can handle the complex, unstructured portion of the problem. In this way, we can quickly reach a near-optimal solution, while guaranteeing convergence to an optimal solution in the limit. Our method of evaluating an algorithm's performance on a given problem, then, is the entire curve of policy quality versus algorithm runtime.

Although the AI literature is rich with attempts to exploit different types of structure, in this thesis we focus on a small subset. Our prior work includes Oracular POMDPs, an extremely structured observation function; and the finite-state MDP approximation to POMDPs, which takes advantage of a structured belief-state space that is learned as the algorithm progresses.

For the remainder of the thesis work, we propose to generalize the concept of Oracular POMDPs to include nearly perfect information from oracles, with nearly no information provided from the environment otherwise; we will also extend the oracle concept to factored state problems, where an oracle can reveal one state variable reliably but not the others. We will investigate automated techniques for learning structure from a given unstructured representation. Finally, we wish to examine in greater detail what can be proven about the optimality-runtime tradeoff of these approximately structured POMDPs.

To evaluate our methods, we will apply them to several types of problems. First, we will introduce new synthetic domains that exhibit the structure we wish to exploit. Second, we will use our structure learning methods on existing domains in the literature. Finally, we will attempt to apply the methods to a real-world robot problem, in order to address doubts (in the minds of the community and of the author) about the usefulness of POMDP methods real robots.

Full text

No comments: