Robot Perception and Learning

Tuesday, August 27, 2013

Lab Meeting, August 29, 2013 (Chiang Yi): Pose Estimation using Local Structure-Specific Shape and Appearance Context

Title: Pose Estimation using Local Structure-Specific Shape and Appearance Context

Authors: Anders Glent Buch, Dirk Kraft, Joni-Kristian Kamarainen, Henrik Gordon Petersen and Norbert Kr ̈uger

Abstract: We address the problem of estimating the alignment pose between two models using structure-specific local descriptors. Our descriptors are generated using a combination of 2D image data and 3D contextual shape data, resulting in a set of semi-local descriptors containing rich appearance and shape information for both edge and texture structures. This is achieved by defining feature space relations which describe the neighborhood of a descriptor. By quantitative evaluations, we show that our descriptors provide high discriminative power compared to state of the art approaches. In addition, we show how to utilize this for the estimation of the alignment pose between two point sets. We present experiments both in
controlled and real-life scenarios to validate our approach.

From: ICRA 2013

Link: http://covil.sdu.dk/publications/paper1099.pdf

Monday, August 19, 2013

Lab Meeting, August 22, 2013 (Yen-Ting): Joint Optimization for Object Class Segmentation and Dense Stereo Reconstruction

Title: Joint Optimization for Object Class Segmentation and Dense Stereo Reconstruction

Authors: Lubor Ladický, Paul Sturgess, Chris Russell, Sunando Sengupta, Yalin Bastanlar, William Clocksin and Philip H.S. Torr

Abstract: The problems of dense stereo reconstruction and object class segmentation can both be formulated as Random Field labeling problems, in which every pixel in the image is assigned a label corresponding to either its disparity, or an object class such as road or building. While these two problems are mutually informative, no attempt has been made to jointly optimize their labelings. In this work we provide a flexible framework configured via cross-validation that unifies the two problems and demonstrate that, by resolving ambiguities, which would be present in real world data if the two problems were considered separately, joint optimization of the two problems substantially improves performance. To evaluate our method, we augment the Leuven data set (http://cms.brookes.ac.uk/research/visiongroup/files/Leuven.zip), which is a stereo video shot from a car driving around the streets of Leuven, with 70 hand labeled object class and disparity maps. We hope that the release of these annotations will stimulate further work in the challenging domain of street view analysis. Complete source code is publicly available (http://cms.brookes.ac.uk/staff/Philip-Torr/ale.htm).

From: International Journal of Computer Vision (IJCV), 2012

Link: Click here

Wednesday, August 07, 2013

Lab Meeting, August 8, 2013 (Channing): Multi-Robot System for Artistic Pattern Formation

Title: Multi-Robot System for Artistic Pattern Formation (ICRA 2011)
Authors: Javier Alonso-Mora, Andreas Breitenmoser, Martin Rufli, Roland Siegwart and Paul Beardsley

Abstract: This paper describes work on multi-robot pattern formation. Arbitrary target patterns are represented with an optimal robot deployment, using a method that is independent of the number of robots. Furthermore, the trajectories are visually appealing in the sense of being smooth, oscillation free, and showing fast convergence. A distributed controller guarantees collision free trajectories while taking into account the kinematics of differentially driven robots. Experimental results are provided for a representative set of patterns, for a swarm of up to ten physical robots, and for fifty virtual robots in simulation.

Paper Link: click here.

Wednesday, July 24, 2013

Lab meeting July 25th 2013 (Benny): Learning to segment and track in RGBD

Presented by: Benny

From: IEEE Transactions on Automation Science and Engineering 2013

Authors: Alex Teichman and Jake Lussier and Sebastian Thrun

Link: Paper

Abstract: We consider the problem of segmenting and tracking deformable objects in color video with depth (RGBD) data available from commodity sensors such as the Asus Xtion Pro Live or Microsoft Kinect. We frame this problem with very few assumptions - no prior object model, no stationary sensor, no prior 3D map - thus making a solution potentially useful for a large number of applications, including semi-supervised learning, 3D model capture, and object recognition.

Our approach makes use of a rich feature set, including local image appearance, depth discontinuities, optical ﬂow, and surface normals to inform the segmentation decision in a conditional random ﬁeld model. In contrast to previous work in this ﬁeld, the proposed method learns how to best make use of these features from ground-truth segmented sequences. We provide qualitative and quantitative analyses which demonstrate substantial improvement over the state of the art.

This paper is an extended version of our previous work [29]. Building on this, we show that it is possible to achieve an order of magnitude speedup and thus real-time performance ( 20FPS) on a laptop computer by applying simple algorithmic optimizations to the original work. This speedup comes at only a minor cost in

overall accuracy and thus makes this approach applicable to a broader range of tasks. We demonstrate one such task: real-time, online, interactive segmentation to efﬁciently collect training data for an off-the-shelf object detector.

Tuesday, July 09, 2013

Lab Meeting July 10th, 2013 (Andi): Probabilistic Models for 3D Urban Scene Understanding from Movable Platforms

Title: Probabilistic Models for 3D Urban Scene Understanding from Movable Platforms

PhD Thesis, Andreas Geiger (KIT)

Abstract:

Visual 3D scene understanding is an important component in autonomous driving and robot navigation. Intelligent vehicles for example often base their decisions on observations obtained from video cameras as they are cheap and easy to employ. Inner-city intersections represent an interesting but also very challenging scenario in this context: The road layout may be very complex and observations are often noisy or even missing due to heavy occlusions. While Highway navigation (e.g., Dickmanns et al. [49]) and autonomous driving on simple and annotated intersections (e.g., DARPA Urban Challenge [30]) have already been demonstrated successfully, understanding and navigating general inner-city crossings with little prior knowledge remains an unsolved problem. This thesis is a contribution to understanding multi-object traffic scenes from video sequences. All data is provided by a camera system which is mounted on top of the autonomous driving platform AnnieWAY [103]. The proposed probabilistic generative model reasons jointly about the 3D scene layout as well as the 3D location and orientation of objects in the scene. In particular, the scene topology, geometry as well as traffic activities are inferred from short video sequences. The model takes advantage of monocular information in the form of vehicle tracklets, vanishing lines and semantic labels. Additionally, the benefit of stereo features such as 3D scene flow and occupancy grids is investigated.

Motivated by the impressive driving capabilities of humans, no further information such as GPS, lidar, radar or map knowledge is required. Experiments conducted on 113 representative intersection sequences show that the developed approach successfully infers the correct layout in a variety of difficult scenarios. To evaluate the importance of each feature cue, experiments with different feature combinations are conducted. Additionally, the proposed method is shown to improve object detection and object orientation estimation performance.

based primarily on the following two papers: (CVPR + NIPS '11)
http://ttic.uchicago.edu/~rurtasun/publications/geiger_etal_cvpr11.pdf
http://machinelearning.wustl.edu/mlpapers/paper_files/NIPS2011_0842.pdf

Thursday, June 27, 2013

Lab Meeting July 3rd, 2013 (Jeff): Switchable Constraints vs. Max-Mixture Models vs. RRR - A Comparison of Three Approaches to Robust Pose Graph SLAM

Title: Switchable Constraints vs. Max-Mixture Models vs. RRR - A Comparison of Three Approaches to Robust Pose Graph SLAM

Authors: Niko Sünderhauf and Peter Protzel

Abstract:

SLAM algorithms that can infer a correct map despite the presence of outliers have recently attracted increasing attention. In the context of SLAM, outlier constraints are typically caused by a failed place recognition due to perceptional aliasing. If not handled correctly, they can have catastrophic effects on the inferred map. Since robust robotic mapping and SLAM are among the key requirements for autonomous long-term operation, inference methods that can cope with such data association failures are a hot topic in current research. Our paper compares three very recently published approaches to robust pose graph SLAM, namely switchable constraints, max-mixture models and the RRR algorithm. All three methods were developed as extensions to existing factor graph-based SLAM back-ends and aim at improving the overall system’s robustness to false positive loop closure constraints. Due to the novelty of the three proposed algorithms, no direct comparison has been conducted so far.

IEEE International Conference on Robotics and Automation (ICRA), 2013

Link:
LocalLink
http://www.tu-chemnitz.de/etit/proaut/rsrc/ICRA12-comparisonRobustSLAM.pdf

Reference Link:
Switchable Constraints
http://www.tu-chemnitz.de/etit/proaut/mitarbeiter/rsrc/IROS12-switchableConstraints.pdf
Max-Mixture
http://www.roboticsproceedings.org/rss08/p40.pdf
RRR
http://www.roboticsproceedings.org/rss08/p30.pdf

Monday, June 17, 2013

Lab Meeting Jun. 19, 2013 (Alan) : Dense Variational Reconstruction of Non-Rigid Surfaces from Monocular Video

Title: Dense Variational Reconstruction of Non-Rigid Surfaces from Monocular Video (CVPR 2013 Oral)
Authors: Ravi Garg, Anastasios Roussos, Lourdes Agapito

Abstract
This paper offers the ﬁrst variational approach to the problem of dense 3D reconstruction of non-rigid surfaces from a monocular video sequence. We formulate nonrigid structure from motion (NRSfM) as a global variational energy minimization problem to estimate dense low-rank smooth 3D shapes for every frame along with the camera motion matrices, given dense 2D correspondences.
Unlike traditional factorization based approaches to NRSfM, which model the low-rank non-rigid shape using a ﬁxed number of basis shapes and corresponding coefﬁcients, we minimize the rank of the matrix of time-varying shapes directly via trace norm minimization. In conjunction with this low-rank constraint, we use an edge preserving total-variation regularization term to obtain spatially smooth shapes for every frame. Thanks to proximal splitting techniques the optimization problem can be decomposed into many point-wise sub-problems and simple linear systems which can be easily solved on GPU hardware. We show results on real sequences of different objects (face, torso, beating heart) where, despite challenges in tracking, illumination changes and occlusions, our method reconstructs highly deforming smooth surfaces densely and accurately directly from video, without the need for any prior models or shape templates.

Link

Monday, May 27, 2013

Lab meeting May 29th 2013 (Jim): Reciprocal collision avoidance

I'm going to present the idea of "reciprocal collision avoidance": each moving agent
should take responsibilities for collision avoidance with each other during the navigation. Based on the model of velocity obstacles, the "reciprocal velocity obstacles" and its variations are developed for multi-agent navigation. The main references / materials are the following papers:

Reciprocal Velocity Obstacles for Real-time Multi-agent Navigation
Jur van den Berg, Ming C. Lin, Dinesh Manocha
IEEE International Conference on Robotics and Automation (ICRA), 2008
website

TheHybrid Reciprocal Velocity Obstacle
Jamie Snape, Jur van den Berg, Stephen J. Guy, Dinesh Manocha
IEEE Transactions on Robotics (T-RO), vol. 27, pp. 696-706, 2011

website

Reciprocaln-body Collision Avoidance
Jur van den Berg, Stephen J. Guy, Ming C. Lin, Dinesh Manocha
Robotics Research: The 14th International Symposium (ISRR), Springer Tracts in Advanced Robotics (STAR), vol. 70, pp. 3-19, 2011

website

Tuesday, May 21, 2013

Lab meeting May 22th 2013 (Tom Hsu): Incorporating User Interaction and Topological Constraints within Contour Completion via Discrete Calculus

Presented by: Tom Hsu

From: Proc. of the Computer Vision and Pattern Recognition (CVPR'13), Portland, Oregon 2013.

Authors: Jia Xu Maxwell D. Collins Vikas Singh (University of Wisconsin-Madison)

Link: Paper

Abstract:

We study the problem of interactive segmentation and contour completion for multiple objects. The form of constraints our model incorporates are those coming from user scribbles (interior or exterior constraints) as well as information regarding the topology of the 2-D space after partitioning (number of closed contours desired). We discuss how concepts from discrete calculus and a simple identity using the Euler characteristic of a planar graph can be utilized to derive a practical algorithm for this problem. We also present specialized branch and bound methods for the case of single contour completion under such constraints. On an extensive dataset of ~1000 images, our experiments suggest that a small amount of side knowledge can give strong improvements over fully unsupervised contour completion methods. We show that by interpreting user indications topologically, user effort is substantially reduced.

Monday, May 06, 2013

Lab meeting Mar 8th 2013 (Gene): Lost! Leveraging the Crowd for Probabilistic Visual Self-Localization

Presented by: Gene

From: CVPR2013

Authors: Marcus A. Brubaker, Andreas Geiger, Raquel Urtasun

Abstract:

In this paper we propose an affordable solution to selflocalization, which utilizes visual odometry and road maps as the only inputs. To this end, we present a probabilistic model as well as an efﬁcient approximate inference algorithm, which is able to utilize distributed computation to meet the real-time requirements of autonomous systems. Because of the probabilistic nature of the model we are able to cope with uncertainty due to noisy visual odometry and inherent ambiguities in the map (e.g., in a Manhattan world). By exploiting freely available, community developed maps and visual odometry measurements, we are able to localize a vehicle up to 3m after only a few seconds of driving on maps which contain more than 2,150km of drivable roads.

Link

Tuesday, April 23, 2013

Lab meeting Apr 24th 2013 (Hank Lin): Scene Parsing with Multiscale Feature Learning, Purity Trees, and Optimal Covers

Presented by: Hank Lin

From: Proc. of the International Conference on Machine Learning (ICML'12), Edinburgh, Scotland, 2012.

Authors: C. Farabet, C. Couprie, L. Najman, Y. LeCun

Link: Paper Video

Abstract:

Scene parsing, or semantic segmentation, consists in la-

beling each pixel in an image with the category of the object

it belongs to. It is a challenging task that involves the simul-

taneous detection, segmentation and recognition of all the

objects in the image.

The scene parsing method proposed here starts by com-

puting a tree of segments from a graph of pixel dissimilari-

ties. Simultaneously, a set of dense feature vectors is com-

puted which encodes regions of multiple sizes centered on

each pixel. The feature extractor is a multiscale convolu-

tional network trained from raw pixels. The feature vec-

tors associated with the segments covered by each node in

the tree are aggregated and fed to a classiﬁer which pro-

duces an estimate of the distribution of object categories

contained in the segment. A subset of tree nodes that cover

the image are then selected so as to maximize the aver-

age “purity” of the class distributions, hence maximizing

the overall likelihood that each segment will contain a sin-

gle object. The convolutional network feature extractor is

trained end-to-end from raw pixels, alleviating the need for

engineered features. After training, the system is parameter

free.

The system yields record accuracies on the Stanford

Background Dataset (8 classes), the Sift Flow Dataset (33

classes) and the Barcelona Dataset (170 classes) while

being an order of magnitude faster than competing ap-

proaches, producing a 320 × 240 image labeling in less

than 1 second.

Wednesday, April 17, 2013

Lab meeting Apr 17th 2013 (Bang-Cheng Wang): Biped Walking Pattern Generation by using Preview Control of Zero-Moment Point

Presented by Bang-Cheng Wang

From Proceedings of the 2003 IEEE
International Conference on Robotics & Automation
Taipei, Taiwan, September 14-19, 2003.

Authors:
Shuuji KAJITA, Fumio KANEHIRO, Kenji KANEKO, Kiyoshi FUJIWARA,
Kensuke HARADA, Kazuhito YOKOI and Hirohisa HIRUKAWA

Abstract:
We introduce a new method of a biped walking pattern
generation by using a preview control of the zero moment
point (ZMP). First, the dynamics of a biped
robot is modeled as a running cart on a table which
gives a convenient representation to treat ZMP. After
reviewing conventional methods of ZMP based pattern
generation, we formalize the problem as the design of a
ZMP tracking servo controller. It is shown that we can
realize such controller by adopting the preview control
theory that uses the future reference. It is also shown
that a preview controller can be used to compensate
the ZMP error caused by the difference between a simple
model and the precise multibody model. The effectiveness
of the proposed method is demonstrated by a
simulation of walking on spiral stairs.

Link

Tuesday, April 09, 2013

Lab Meeting April 10, 2013 (Jimmy): Geodesic Flow Kernel for Unsupervised Domain Adaptation

Title: Geodesic Flow Kernel for Unsupervised Domain Adaptation
Authors: Boqing Gong, Yuan Shi, Fei Sha, Kristen Grauman
In: CVPR2012

Abstract
In real-world applications of visual recognition, many factors—such as pose, illumination, or image quality—can cause a significant mismatch between the source domain on which classifiers are trained and the target domain to which those classifiers are applied. As such, the classifiers often perform poorly on the target domain. Domain adaptation techniques aim to correct the mismatch. Existing approaches have concentrated on learning feature representations that are invariant across domains, and they often do not directly exploit low-dimensional structures that are intrinsic to many vision datasets. In this paper, we propose a new kernel-based method that takes advantage of such structures. Our geodesic flow kernel models domain shift by integrating an infinite number of subspaces that characterize changes in geometric and statistical properties from the source to the target domain. Our approach is computationally advantageous, automatically inferring important algorithmic parameters without requiring extensive crossvalidation or labeled data from either domain. We also introduce a metric that reliably measures the adaptability between a pair of source and target domains. For a given target domain and several source domains, the metric can be used to automatically select the optimal source domain to adapt and avoid less desirable ones. Empirical studies on standard datasets demonstrate the advantages of our approach over competing methods.

[link]

Wednesday, March 27, 2013

Lab Meeting, March 28, 2013 (Chiang Yi): Efficient Model-based 3D Tracking of Hand Articulations using Kinect (BMVC 2011)

Authors: Iason Oikonomidis, Nikolaos Kyriazis
,Antonis A. Argyros

Abstract:
We present a novel solution to the problem of recovering and tracking the 3D po-
sition, orientation and full articulation of a human hand from markerless visual obser-
vations obtained by a Kinect sensor. We treat this as an optimization problem, seeking
for the hand model parameters that minimize the discrepancy between the appearance
and 3D structure of hypothesized instances of a hand model and actual hand observa-
tions. This optimization problem is effectively solved using a variant of Particle Swarm
Optimization (PSO). The proposed method does not require special markers and/or a
complex image acquisition setup. Being model based, it provides continuous solutions
to the problem of tracking hand articulations. Extensive experiments with a prototype
GPU-based implementation of the proposed method demonstrate that accurate and ro-
bust 3D tracking of hand articulations can be achieved in near real-time (15Hz).

LINK

extended work: Tracking the articulated motion of two strongly interacting hands

Tuesday, March 19, 2013

Lab Meeting, March 21, 2013 (Yen-Ting): Extracting 3D Scene-Consistent Object Proposals and Depth from Stereo Images (ECCV 2012)

Authors: Michael Bleyer, Christoph Rhemann, and Carsten Rother

Abstract: This work combines two active areas of research in computer vision: unsupervised object extraction from a single image, and depth estimation from a stereo image pair. A recent, successful trend in unsupervised object extraction is to exploit so-called “3D scene-consistency”, that is enforcing that objects obey underlying physical constraints of the 3D scene, such as occupancy of 3D space and gravity of objects. Our main contribution is to introduce the concept of 3D scene-consistency into stereo matching. We show that this concept is beneficial for both tasks, object extraction and depth estimation. In particular, we demonstrate that our approach is able to create a large set of 3D scene-consistent object proposals, by varying e.g. the prior on the number of objects...

Link

Thursday, March 14, 2013

Lab Meeting, March 14, 2013 (Channing): The Design of LEO: a 2D Bipedal Walking Robot for Online Autonomous Reinforcement Learning (IROS 2010)

Authors: Erik Schuitema, Martijn Wisse, Thijs Ramakers and Pieter Jonker

Abstract: Real robots demonstrating online Reinforcement Learning (RL) to learn new tasks are hard to find. The specific properties and limitations of real robots have a large impact on their suitability for RL experiments. In this work, we derive the main hardware and software requirements that a RL robot should fulfill, and present our biped robot LEO that was specifically designed to meet these requirements. We verify its aptitude in autonomous walking experiments using a pre-programmed controller. Although there is room
for improvement in the design, the robot was able to walk, fall and stand up without human intervention for 8 hours, during which it made over 43,000 footsteps.

Link

Wednesday, March 13, 2013

Lab Meeting, March 7, 2013 (Benny):A Segmentation and Data Association Annotation System for Laser-based Multi-Target Tracking Evaluation

Author: Chien-Chen Weng, Chieh-Chih Wang and Jennifer Healey

Abstract—2D laser scanners are now widely used to accomplish robot perception tasks such as SLAM and multi-target tracking (MTT). While a number of SLAM benchmarking datasets are available, only a few works have discussed the issues of collecting multi-target tracking benchmarking datasets.

In this work, a segmentation and data association annotation system is proposed for evaluating multi-target tracking using 2D laser scanners. The proposed annotation system uses the existing MTT algorithm to generate initial annotation results and uses camera images as the strong hints to assist annotators to recognize moving objects in laser scans. The annotators can draw the object’s shape and future trajectory to automate segmentation and data association and reduce the annotation task loading. The user study results show that the performance of the proposed annotation system is superior in the V-measure vs. annotation speed tests and the false positive and false negative rates.

Link

Wednesday, February 20, 2013

Lab meeting Feb. 21, 2013 (ChihChung) A Tensor-Based Algorithm for High-Order Graph Matching (PAMI 2010)

Authors: Olivier Duchenne, Francis Bach, In-So Kweon, and Jean Ponce

Abstract: This paper addresses the problem of establishing correspondences between two sets of visual features using higher-order constraints instead of the unary or pairwise ones used in classical methods. Concretely, the corresponding hypergraph matching problem is formulated as the maximization of a multi-linear objective function over all permutations of the features. This function is deﬁned by a tensor representing the afﬁnity between feature tuples. It is maximized using a generalization of spectral techniques where a relaxed problem is ﬁrst solved by a multi-dimensional power method, and the solution is then projected onto the closest assignment matrix. The proposed approach has been implemented, and it is compared to state-of-the-art algorithms on both synthetic and real data.

Link

Tuesday, January 22, 2013

Lab meeting Jan. 23, 2013 (Gene): Fully Distributed Scalable Smoothing and Mapping with Robust Multi-robot Data Association (IEEE 2012)

Title: Fully Distributed Scalable Smoothing and Mapping with Robust Multi-robot Data Association (IEEE 2012)
Authors: Alexander Chunningham, Kai M. Wurm, Wolfarm Burgard, and Frank Dellaert

Abstract:

In this paper we focus on the multi-robot perception problem, and present an experimentally validated end-to-end multi-robot mapping framework, enabling individual robots in a team to see beyond their individual sensor horizons. The inference part of our system is the DDF-SAM algorithm [1], which provides a decentralized communication and inference scheme, but did not address the crucial issue of data association.

One key contribution is a novel, RANSAC-based, approach for performing the between-robot data associations and initialization of relative frames of reference. We demonstrate this system with both data collected from real robot experiments, as well as in a large scale simulated experiment demonstrating the scalability of the proposed approach.

Link

Tuesday, January 08, 2013

Lab meeting Jan 9th 2013 (Bang-Cheng Wang): Kicking a Ball – Modeling Complex Dynamic Motions for Humanoid Robots

Presented by Bang-Cheng Wang

From RoboCup 2010: Robot Soccer World Cup XIV, ser. Lecture Notes

in Artificial Intelligence, E. Chown, A. Matsumoto, P. Pl¨oger,

and J. R. del Solar, Eds. Springer, to appear in 2011.

Authors:
Judith Müller, Tim Laue, and Thomas Röfer

Abstract:
Complex motions like kicking a ball into the goal are becoming
more important in RoboCup leagues such as the Standard Platform
League. Thus, there is a need for motion sequences that can be parameterized
and changed dynamically. This paper presents a motion engine
that translates motions into joint angles by using trajectories. These
motions are defined as a set of Bezier curves that can be changed online
to allow adjusting, for example, a kicking motion precisely to the actual
position of the ball. During the execution, motions are stabilized by
the combination of center of mass balancing and a gyro feedback-based
closed-loop PID controller.