Robot Perception and Learning: Lab meeting Apr 24th 2013 (Hank Lin): Scene Parsing with Multiscale Feature Learning, Purity Trees, and Optimal Covers

Tuesday, April 23, 2013

Presented by: Hank Lin

From: Proc. of the International Conference on Machine Learning (ICML'12), Edinburgh, Scotland, 2012.

Authors: C. Farabet, C. Couprie, L. Najman, Y. LeCun

Link: Paper Video

Abstract:

Scene parsing, or semantic segmentation, consists in la-

beling each pixel in an image with the category of the object

it belongs to. It is a challenging task that involves the simul-

taneous detection, segmentation and recognition of all the

objects in the image.

The scene parsing method proposed here starts by com-

puting a tree of segments from a graph of pixel dissimilari-

ties. Simultaneously, a set of dense feature vectors is com-

puted which encodes regions of multiple sizes centered on

each pixel. The feature extractor is a multiscale convolu-

tional network trained from raw pixels. The feature vec-

tors associated with the segments covered by each node in

the tree are aggregated and fed to a classiﬁer which pro-

duces an estimate of the distribution of object categories

contained in the segment. A subset of tree nodes that cover

the image are then selected so as to maximize the aver-

age “purity” of the class distributions, hence maximizing

the overall likelihood that each segment will contain a sin-

gle object. The convolutional network feature extractor is

trained end-to-end from raw pixels, alleviating the need for

engineered features. After training, the system is parameter

free.

The system yields record accuracies on the Stanford

Background Dataset (8 classes), the Sift Flow Dataset (33

classes) and the Barcelona Dataset (170 classes) while

being an order of magnitude faster than competing ap-

proaches, producing a 320 × 240 image labeling in less

than 1 second.

Tuesday, April 23, 2013