Robot Perception and Learning: July 2013

Wednesday, July 24, 2013

Lab meeting July 25th 2013 (Benny): Learning to segment and track in RGBD

Presented by: Benny

From: IEEE Transactions on Automation Science and Engineering 2013

Authors: Alex Teichman and Jake Lussier and Sebastian Thrun

Abstract: We consider the problem of segmenting and tracking deformable objects in color video with depth (RGBD) data available from commodity sensors such as the Asus Xtion Pro Live or Microsoft Kinect. We frame this problem with very few assumptions - no prior object model, no stationary sensor, no prior 3D map - thus making a solution potentially useful for a large number of applications, including semi-supervised learning, 3D model capture, and object recognition.

Our approach makes use of a rich feature set, including local image appearance, depth discontinuities, optical ﬂow, and surface normals to inform the segmentation decision in a conditional random ﬁeld model. In contrast to previous work in this ﬁeld, the proposed method learns how to best make use of these features from ground-truth segmented sequences. We provide qualitative and quantitative analyses which demonstrate substantial improvement over the state of the art.

This paper is an extended version of our previous work [29]. Building on this, we show that it is possible to achieve an order of magnitude speedup and thus real-time performance ( 20FPS) on a laptop computer by applying simple algorithmic optimizations to the original work. This speedup comes at only a minor cost in

overall accuracy and thus makes this approach applicable to a broader range of tasks. We demonstrate one such task: real-time, online, interactive segmentation to efﬁciently collect training data for an off-the-shelf object detector.

Tuesday, July 09, 2013

Lab Meeting July 10th, 2013 (Andi): Probabilistic Models for 3D Urban Scene Understanding from Movable Platforms

Title: Probabilistic Models for 3D Urban Scene Understanding from Movable Platforms

PhD Thesis, Andreas Geiger (KIT)

Abstract:

Visual 3D scene understanding is an important component in autonomous driving and robot navigation. Intelligent vehicles for example often base their decisions on observations obtained from video cameras as they are cheap and easy to employ. Inner-city intersections represent an interesting but also very challenging scenario in this context: The road layout may be very complex and observations are often noisy or even missing due to heavy occlusions. While Highway navigation (e.g., Dickmanns et al. [49]) and autonomous driving on simple and annotated intersections (e.g., DARPA Urban Challenge [30]) have already been demonstrated successfully, understanding and navigating general inner-city crossings with little prior knowledge remains an unsolved problem. This thesis is a contribution to understanding multi-object traffic scenes from video sequences. All data is provided by a camera system which is mounted on top of the autonomous driving platform AnnieWAY [103]. The proposed probabilistic generative model reasons jointly about the 3D scene layout as well as the 3D location and orientation of objects in the scene. In particular, the scene topology, geometry as well as traffic activities are inferred from short video sequences. The model takes advantage of monocular information in the form of vehicle tracklets, vanishing lines and semantic labels. Additionally, the benefit of stereo features such as 3D scene flow and occupancy grids is investigated.

Motivated by the impressive driving capabilities of humans, no further information such as GPS, lidar, radar or map knowledge is required. Experiments conducted on 113 representative intersection sequences show that the developed approach successfully infers the correct layout in a variety of difficult scenarios. To evaluate the importance of each feature cue, experiments with different feature combinations are conducted. Additionally, the proposed method is shown to improve object detection and object orientation estimation performance.

based primarily on the following two papers: (CVPR + NIPS '11)
http://ttic.uchicago.edu/~rurtasun/publications/geiger_etal_cvpr11.pdf
http://machinelearning.wustl.edu/mlpapers/paper_files/NIPS2011_0842.pdf