Sunday, September 04, 2005

CMU Talk: Estimating Geometric Scene Context from a Single Image

Speaker: Alexei A. Efros

Humans have an amazing ability to instantly grasp the overall 3D structure of a scene -- ground orientation, relative positions of major landmarks, etc -- even from a single image. This ability is completely missing in most popular recognition algorithms, which pretend that the world is flat and/or view it through a patch-sized peephole. Yet it seems very likely that having a grasp of this "geometric context" of a scene should be of great assistance for many tasks, including recognition, navigation, and novel view synthesis.

In this talk, I will describe our first steps toward the goal of estimating a 3D scene context from a single image. We propose to estimate the coarse geometric properties of a scene by learning appearance-based models of \emph{geometric} classes. Geometric classes describe the 3D orientation of an image region with respect to the camera. We provide a multiple-hypothesis segmentation framework for robustly estimating scene structure from a single image and obtaining confidences for each geometric label. These confidences can then (hopefully) be used to improve the performance of many other applications. We provide a quantitative evaluation of our algorithm on a dataset of challenging outdoor images.

We also demonstrate its usefulness in two applications: 1) improving object detection (preliminary results), and 2) automatic qualitative single-view reconstruction ("Automatic Photo Pop-up", SIGGRAPH'05).

Joint work with Derek Hoiem and Martial Hebert at CMU.

No comments: