Thomas Stepleton, Robotics Institute, Carnegie Mellon University
August 28, 2006
Abstract
This thesis proposal presents a new data-driven computational framework for unsupervised learning of object models from video. This framework integrates object representation learning, image parsing, and inference into a coherent whole based on the principles of persistence, coherent covariation, and predictability of visual patterns associated with objects or object parts in dynamic visual scenes. Visual patterns in video are extracted and linked across frames by exploiting the tendency of objects to persist and change gradually in visual scenes. First, a multitude of visual pattern proposals are generated by a clustering process based on Gestalt rules. A particle filtering-based inference mechanism then uses the proposals to construct and refine hypotheses about what objects are present in the video. Hypotheses are judged based on their ability to predict future video events, and the best hypotheses are finally used to create new or refined object models. For improved robustness in feature and object identification and inference, the mechanism learns and employs representations that explicitly encode the temporal dynamics of visual patterns. The key insight of the approach is the use of prediction of “future” visual events to facilitate inference and to validate learned representations. This framework is inspired by principles and insights from cognitive neuroscience, and thus the mechanisms investigated are relevant to understanding the representational development of object models in the brain.
A copy of the thesis proposal document can be found at http://gs2040.sp.cs.cmu.edu/UPOD/.
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.