Robot Perception and Learning: February 2010

Thursday, February 18, 2010

CMU PhD Thesis proposal: Data-driven Scene Parsing With the Visual Memex

Data-driven Scene Parsing With the Visual Memex

Tomasz Malisiewicz

Carnegie Mellon University

February 18, 2010, 4:00 p.m., NSH 3305

Abstract: This proposal is concerned with the problem of image understanding. Given a single static image, the goal is to explain the entire image by recognizing all of the objects depicted in the image. We formulate the problem of image understanding as image parsing -- breaking up the image into semantically meaningful regions and recognizing the objects embedded in each region. In our approach we strive to obtain a dense understanding of the image by not leaving any portion of the image unexplained. While most approaches to scene understanding formulate the problem as that of recognizing abstract object categories (and for object asking “what is this?”), we use a data-driven model of recognition more akin to memory (and ask the question: “what is this like?”). We present an exemplar-based framework for reasoning about objects and their relationships in images dubbed the Visual Memex. The Visual Memex is a non-parametric graph-based model of objects which encodes two types of object relationships: visual similarity between object exemplars, and 2D spatial context between objects in a single image. We use a region-based representation of exemplar objects which has been shown to be superior to the popular rectangular window approach for a wide array of things and stuff found in natural scenes. During training, we learn a set of similarity functions per-exemplar and formulate recognition as association between automatically extracted regions from the input image and exemplar regions in the Visual Memex. We use both bottom-up image segmentation, mid-level reasoning about segment relationships as well as spatial relationships between exemplars in the Visual Memex as complementary sources of object hypotheses. I propose an iterative image parsing framework which builds an interpretation of an input image by iteratively conditioning on a current (partial) interpretation and generating novel segment hypotheses using low-level, mid-level, and high-level cues. An evaluation is proposed which evaluates the system with respect to recognition as well as segmentation on real world scenes from LabelMe.

Thesis Committee
Alexei A. Efros, Chair
Martial Hebert
Takeo Kanade
Pietro Perona, California Institute of Technology

Wednesday, February 03, 2010

CMU talk: Simon J.D. Prince, Monday, Feb 8, NSH 1507, 3pm-4pm

Title: Modeling Facial Images with Patches
Speaker: Simon J.D. Prince

Abstract:

Faces are one of the most studied object classes in computer vision. Performance is very good for tasks such as identity recognition and gender classification when the pose, lighting and expression are controlled. However, in uncontrolled conditions, these tasks remain challenging. Part of the reason for this limitation is the choice of representation: for example, faces have variously been modeled as subspaces and constellations of features, but these representations have only a limited ability to describe uncontrolled facial images. In this talk, I will present several experiments in which we have investigated representing faces with a regular grid of patches. This type of model can better capture the complex multimodal appearance of uncontrolled faces. I will present models for both gender recognition (or more generally classification of facial characteristic) and pose estimation (regression). I will also show how to extend these patch-based models to allow generation of near photo-realistic images of novel faces.

Bio:

Simon Prince was an undergraduate at UCL where he studied Psychology. His doctoral work was at the University of Oxford, in the Department of Experimental Psychology where he investigated human stereo vision using psychophysics. He subsequently worked in the Laboratory of Physiology in Oxford for two years as a post-doc with Andrew Parker studying stereo vision using single unit electro-physiology. In 2001 he became a post-doctoral research fellow in the Department of Electrical and Computer Engineering in the National University of Singapore working on augmented reality. Following this, he moved to Toronto, Canada, where he worked as a post-doc in computer vision for James Elder in the Centre for Vision Research in York University. Since 2005 he has been a faculty member in the department of computer science at University College London. His current interests include image segmentation, face recognition, optical tomography and object recognition.