Sunday, December 29, 2013

Lab Meeting, January 2nd, 2014 (Gene Chang): Zhou, Feng, and Fernando De la Torre. "Deformable Graph Matching." IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2013

Deformable Graph Matching

Feng Zhou Fernando and De la Torre
Robotics Institute, Carnegie Mellon University, Pittsburgh, PA 15213

Graph matching (GM) is a fundamental problem in computer science, and it has been successfully applied to many problems in computer vision. Although widely used, existing GM algorithms cannot incorporate global consistence among nodes, which is a natural constraint in computer vision problems. This paper proposes deformable graph matching (DGM), an extension of GM for matching graphs subject to global rigid and non-rigid geometric constraints. The key idea of this work is a new factorization of the pair-wise affinity matrix. This factorization decouples the affinity matrix into the local structure of each graph and the pair-wise affinity edges. Besides the ability to incorporate global geometric transformations, this factorization offers three more benefits. First, there is no need to compute the costly (in space and time) pair-wise affinity matrix. Second, it provides a unified view of many GM methods and extends the standard iterative closest point algorithm. Third, it allows to use the path-following optimization algorithm that leads to improved optimization strategies and matching performance. Experimental results on synthetic and real databases illustrate how DGM outperforms state-of-the-art algorithms for GM. The code is available at

IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2013


Thursday, December 26, 2013

Lab Meeting, December 26, 2013 (Tom Hsu): An Efficient Motion Segmentation Algorithm for Multibody RGB-D SLAM, (Proceedings of Australasian Conference on Robotics and Automation, 2-4 Dec 2013)

An Efficient Motion Segmentation Algorithm for Multibody RGB-D SLAM

Youbing Wang, Shoudong Huang
Faculty of Engineering and IT, University of Technology, Sydney, Australia

A simple motion segmentation algorithm using only two frames of RGB-D data is proposed, and both simulational and experimental segmentation results show its efficiency and reliability. To further verify its usability in multi-body SLAM scenarios, we firstly apply it to a simulated typical multi-body SLAM problem
with only a RGB-D camera, and then utilize it to segment a real RGB-D dataset collected by ourselves. Based on the good results of our motion segmentation algorithm, we can get satisfactory SLAM results for the simulated problem and the segmentation results using real data also enable us to get visual odometry for each
motion group thus facilitate the following steps to solve the practical multi-body RGB-D SLAM problems.

Proceedings of Australasian Conference on Robotics and Automation, 2-4 Dec 2013, University of New South Wales, Sydney Australia


Monday, December 16, 2013

Lab Meeting, December 19, 2013 (Yen-Ting): Deformable Spatial Pyramid Matching for Fast Dense Correspondences

Title: Deformable Spatial Pyramid Matching for Fast Dense Correspondences

Authors: Jaechul Kim, Ce Liu, Fei Sha and Kristen Grauman

Abstract: We introduce a fast deformable spatial pyramid (DSP) matching algorithm for computing dense pixel correspondences. Dense matching methods typically enforce both appearance agreement between matched pixels as well as geometric smoothness between neighboring pixels. Whereas the prevailing approaches operate at the pixel level, we propose a pyramid graph model that simultaneously regularizes match consistency at multiple spatial extents—ranging from an entire image, to coarse grid cells, to every single pixel. This novel regularization substantially improves pixel-level matching in the face of challenging image variations, while the “deformable” aspect of our model overcomes the strict rigidity of traditional spatial pyramids. Results on LabelMe and Caltech show our approach outperforms state-of-the-art methods (SIFT Flow [15] and PatchMatch [2]), both in terms of accuracy and run time.

[2] C. Barnes, E. Shechtman, D. Goldman, and A. Finkelstein. The Generalized PatchMatch Correspondence Algorithm. In ECCV, 2010.
[15] C. Liu, J. Yuen, and A. Torralba. SIFT Flow: Dense Correspondence across Different Scenes and Its Applications. PAMI, 33(5), 2011.

From: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2013

Link: Click here

Wednesday, November 27, 2013

Lab Meeting Nov. 28, (Yi) Coherent Motion Segmentation in Moving Camera Videos using Optical Flow Orientations

Authors: Manjunath Narayana, Allen Hanson, Erik Learned-Miller


   In moving camera videos, motion segmentation is com-monly performed using the image plane motion of pixels, or optical flow. However, objects that are at different depths from the camera can exhibit different optical flows even if they share the same real-world motion. This can cause a depth-dependent segmentation of the scene. Our goal is to develop a segmentation algorithm that clusters pixels that have similar real-world motion irrespective of their depth
in the scene. Our solution uses optical flow orientations in-stead of the complete vectors and exploits the well-known property that under camera translation, optical flow ori-entations are independent of object depth. We introduce a probabilistic model that automatically estimates the number of observed independent motions and results in a labeling that is consistent with real-world motion in the scene. The result of our system is that static objects are correctly identified as one segment, even if they are at different depths. Color features and information from previous frames in the video sequence are used to correct occasional errors due
to the orientation-based segmentation. We present results on more than thirty videos from different benchmarks. The system is particularly robust on complex background scenes containing objects at significantly different depths.


Wednesday, November 20, 2013

Lab Meeting Nov. 21, (Channing) Where Do I Look Now? Gaze Allocation During Visually Guided Manipulation (ICRA 2012)

Title: Where Do I Look Now? Gaze Allocation During Visually Guided Manipulation (ICRA 2012)
Authors: Jose Nunez-Varela, B. Ravindran, Jeremy L.Wyatt

ABSTRACT - In this work we present principled methods for the coordination of a robot's oculomotor system with the rest of its body motor systems. The problem is to decide which physical actions to perform next and where the robot's gaze should be directed in order to gain information that is relevant to the success of its physical actions. Previous work on this problem has shown that a reward-based coordination mechanism provides an efficient solution. However, that approach does not allow the robot to move its gaze to different parts of the scene, it considers the robot to have only one motor system, and assumes that the actions have the same duration. The main contributions of our work are to extend that previous reward-based approach by making decisions about where to fixate the robot's gaze, handling multiple motor systems, and handling actions of variable duration. We compare our approach against two common baselines: random and round robin gaze allocation. We show how our method provides a more effective strategy to allocate gaze where it is needed the most.


The Extension of the above work:
Title: Gaze Allocation Analysis for a Visually Guided Manipulation Task (SAB 2012)
Authors: Jose Nunez-Varela, B. Ravindran, Jeremy L.Wyatt

ABSTRACT - Findings from eye movement research in humans have demonstrated that the task determines where to look. One hypothesis is that the purpose of looking is to reduce uncertainty about properties relevant to the task. Following this hypothesis, we de ne a model that poses the problem of where to look as one of maximising task performance by reducing task relevant uncertainty. We implement and test our model on a simulated humanoid robot which has to move objects from a table into containers. Our model outperforms and is more robust than two other baseline schemes in terms of task performance whilst varying three environmental conditions, reach/grasp sensitivity, observation noise and the camera's field of view.

Thursday, November 14, 2013

Lab meeting Nov.14,(Benny) Detection- and Trajectory-Level Exclusion in Multiple Object Tracking (CVPR2013)

Authors: Anton Milan, Konrad Schindler, Stefan Roth

When tracking multiple targets in crowded scenarios, modeling mutual exclusion between distinct targets becomes important at two levels: (1) in data association, each target observation should support at most one trajectory and each trajectory should be assigned at most one observation per frame; (2) in trajectory estimation, two trajectories should remain spatially separated at all times to avoid collisions. Yet, existing trackers often sidestep these important constraints. We address this using a mixed discrete-continuous conditional random field (CRF) that explicitly models both types of constraints: Exclusion between conflicting observations with supermodular pairwise terms, and exclusion between trajectories by generalizing global label costs to suppress the co-occurrence of incompatible labels (trajectories). We develop an expansion move-based MAP estimation scheme that handles both non-submodular constraints and pairwise global label costs. Furthermore, we perform a statistical analysis of ground-truth trajectories to derive appropriate CRF potentials for modeling data fidelity, target dynamics, and inter-target occlusion.

Wednesday, November 06, 2013

Lab meeting Nov.7, (ChihChung) MAV Urban Localization from Google Street View Data (IROS2013)

Authors: Andr´as L. Majdik, Yves Albers-Schoenberg, Davide Scaramuzza

Abstract—We tackle the problem of globally localizing a
camera-equipped micro aerial vehicle flying within urban environments
for which a Google Street View image database
exists. To avoid the caveats of current image-search algorithms
in case of severe viewpoint changes between the query and the
database images, we propose to generate virtual views of the
scene, which exploit the air-ground geometry of the system.
To limit the computational complexity of the algorithm, we
rely on a histogram-voting scheme to select the best putative
image correspondences. The proposed approach is tested on a
2km image dataset captured with a small quadroctopter flying
in the streets of Zurich. The success of our approach shows
that our new air-ground matching algorithm can robustly handle
extreme changes in viewpoint, illumination, perceptual aliasing,
and over-season variations, thus, outperforming conventional
visual place-recognition approaches.


Wednesday, October 30, 2013

Lab meeting Oct.31, (Andi) Non-rigid metric reconstruction from perspective cameras (IVCJ 2010)

Title: Non-rigid metric reconstruction from perspective cameras

Authors: Xavier Lladó, Alessio Del Bue, Lourdes Agapito

Abstract: The metric reconstruction of a non-rigid object viewed by a generic camera poses new challenges since current approaches for Structure from Motion assume the rigidity constraint of a shape as an essential condition. In this work, we focus on the estimation of the 3-D Euclidean shape and motion of a non-rigid shape observed by a perspective camera. In such case deformation and perspective effects are difficult to decouple – the parametrization of the 3-D non-rigid body may mistakenly account for the perspective distortion. Our method relies on the fact that it is often a reasonable assumption that some of the points on the object’s surface are deforming throughout the sequence while others remain rigid. Thus, relying on the rigidity constraints of a subset of rigid points, we estimate the perspective to metric upgrade trans- formation. First, we use an automatic segmentation algorithm to identify the set of rigid points. These are then used to estimate the internal camera calibration parameters and the overall rigid motion. Finally, we formulate the problem of non-rigid shape and motion estimation as a non-linear optimization where the objective function to be minimized is the image reprojection error. The prior information that some of the points in the object are rigid can also be added as a constraint to the non-linear minimization scheme in order to avoid ambiguous configurations. We perform experiments on different synthetic and real data sets which show that even when using a minimal set of rigid points and when varying the intrinsic cam- era parameters it is possible to obtain reliable metric information.


Wednesday, October 23, 2013

Lab Meeting Oct. 24, 2013 (Alan): Optimal Metric Projections for Deformable and Articulated Structure-from-Motion (IJCV 2012)

Title: Optimal Metric Projections for Deformable and Articulated Structure-from-Motion (IJCV 2012)
Authors: Marco Paladini, Alessio Del Bue, João Xavier, Lourdes Agapito, Marko Stoši´c, Marija Dodig

This paper describes novel algorithms for recovering the 3D shape and motion of deformable and articulated objects purely from uncalibrated 2D image measurements using a factorisation approach. Most approaches to deformable and articulated structure from motion require to upgrade an initial affine solution to Euclidean space by imposing metric constraints on the motion matrix. While in the case of rigid structure the metric upgrade step is simple since the constraints can be formulated as linear, deformability in the shape introduces non-linearities. In this paper we propose an alternating bilinear approach to solve for non-rigid 3D shape and motion, associated with a globally optimal projection step of the motion matrices onto the manifold of metric constraints. Our novel optimal projection step combines into a single optimisation the computation of the orthographic projection matrix and the configuration weights that give the closest motion matrix that satisfies the correct block structure with the additional constraint that the projection matrix is guaranteed to have orthonormal rows (i.e. its transpose lies on the Stiefel manifold). This constraint turns out to be non-convex. The key contribution of this work is to introduce an efficient convex relaxation for the non-convex projection step. Efficient in the sense that, for both the cases of deformable and articulated motion, the proposed relaxations turned out to be exact (i.e. tight) in all our numerical experiments. The convex relaxations are semi-definite (SDP) or second-order cone (SOCP) programs which can be readily tackled by popular solvers. An important advantage of these new algorithms is their ability to handle missing data which becomes crucial when dealing with real video sequences with self-occlusions. We show successful results of our algorithms on synthetic and real sequences of both deformable and articulated data. We also show comparative results with state of the art algorithms which reveal that our new methods outperform existing ones.


Wednesday, October 16, 2013

Lab Meeting October 17th, 2013 (Jeff): Temporally Scalable Visual SLAM using a Reduced Pose Graph

Title: Temporally Scalable Visual SLAM using a Reduced Pose Graph

Authors: Hordur Johannsson, Michael Kaess, Maurice Fallon, and John J. Leonard


In this paper, we demonstrate a system for temporally scalable visual SLAM using a reduced pose graph representation. Unlike previous visual SLAM approaches that maintain static keyframes, our approach uses new measurements to continually improve the map, yet achieves efficiency by avoiding adding redundant frames and not using marginalization to reduce the graph. To evaluate our approach, we present results using an online binocular visual SLAM system that uses place recognition for both robustness and multi-session operation. Additionally, to enable large-scale indoor mapping, our system automatically detects elevator rides based on accelerometer data. We demonstrate long-term mapping in a large multi-floor building, using approximately nine hours of data collected over the course of six months. Our results illustrate the capability of our visual SLAM system to map a large are over extended period of time.

IEEE International Conference on Robotics and Automation (ICRA), 2013


Reference Link:
Another paper with the same title:
In RSS Workshop on Long-term Operation of Autonomous Robotic Systems in Changing Environments, 2012.

Monday, September 30, 2013

Lab Meeting Oct. 3rd (Jim): Robot Navigation in Dense Human Crowds: the Case for Cooperation

Title: Robot Navigation in Dense Human Crowds: the Case for Cooperation
Authors: Pete Trautman, Jeremy Ma,  Richard M. Murray and Andreas Krause
in Proceedings of the IEEE International Conference on Robotics and Automation (ICRA 2013)

... we explore two questions. Can we design a navigation algorithm that encourages humans to cooperate with a robot? Would such cooperation improve navigation performance? We address the first question by developing a probabilistic predictive model of cooperative collision avoidance and goal-oriented behavior. ... We answer the second question by empirically validating our model in a natural environment (a university cafeteria), and in the process, carry out the first extensive quantitative study of robot navigation in dense human crowds (completing 488 runs). The “multiple goal” interacting Gaussian processes algorithm performs comparably with human teleoperators in crowd densities near 1 person/m2, while a state of the art noncooperative planner exhibits unsafe behavior more than 3 times as often as our planner. ... We conclude that a cooperation model is critical for safe and efficient robot navigation in dense human crowds.


Tuesday, September 24, 2013

Lab Meeting September 26, 2013 (Gene): Track-to-Track Fusion With Asynchronous Sensors Using Information Matrix Fusion for Surround Environment Perception

Title: Track-to-Track Fusion With Asynchronous Sensors Using Information Matrix Fusion for Surround Environment Perception
Authors: Michael Aeberhard, Stefan Schlichthärle, Nico Kaempchen, and Torsten Bertram


Abstract—Driver-assistance systems and automated driving applications in the future will require reliable and flexible surround environment perception. Sensor data fusion is typically used to increase reliability and the observable field of view. In this paper, a novel approach to track-to-track fusion in a high-level sensor

data fusion architecture for automotive surround environment perception using information matrix fusion (IMF) is presented. It is shown that IMF produces the same good accuracy in state estimation as a low-level centralized Kalman filter, which is widely known to be the most accurate method of fusion. Additionally, as

opposed to state-of-the-art track-to-track fusion algorithms, the presented approach guarantees a globally maintained track over time as an object passes in and out of the field of view of several sensors, as required in surround environment perception. As opposed to the often-used cascaded Kalman filter for track-to-track

fusion, it is shown that the IMF algorithm has a smaller error and maintains consistency in the state estimation. The proposed approach using IMF is compared with other track-to-track fusion algorithms in simulation and is shown to perform well using real sensor data in a prototype vehicle with a 12-sensor configuration for surround environment perception in highly automated driving applications.


Wednesday, September 11, 2013

Lab Meeting September 12, 2013 (Jimmy): Indoor Tracking and Navigation Using Received Signal Strength and Compressive Sensing on a Mobile Device

Title: Indoor Tracking and Navigation Using Received Signal Strength and Compressive Sensing on a Mobile Device
Authors: Anthea Wain Sy Au, Chen Feng, Shahrokh Valaee, Sophia Reyes, Sameh Sorour, Samuel N. Markowitz, Deborah Gold, Keith Gordon, and Moshe Eizenman
In: IEEE Transactions on Mobile Computing 2013

An indoor tracking and navigation system based on measurements of received signal strength (RSS) in wireless local area network (WLAN) is proposed. In the system, the location determination problem is solved by first applying a proximity constraint to limit the distance between a coarse estimate of the current position and a previous estimate. Then, a Compressive Sensing-based (CS-based) positioning scheme, proposed in our previous work [1], [2], is applied to obtain a refined position estimate. The refined estimate is used with a map-adaptive Kalman filter, which assumes a linear motion between intersections on a map that describes the user’s path, to obtain a more robust position estimate. Experimental results with the system that is implemented on a PDA with limited resources (HP iPAQ hx2750 PDA) show that the proposed tracking system outperforms the widely used traditional positioning and tracking systems. Meanwhile, the tracking system leads to 12.6 percent reduction in the mean position error compared to the CS-based stationary positioning system when three APs are used. A navigation module that is integrated with the tracking system provides users with instructions to guide them to predefined destinations. Thirty visually impaired subjects from the Canadian National Institute for the Blind (CNIB) were invited to further evaluate the performance of the navigation system. Testing results suggest that the proposed system can be used to guide visually impaired subjects to their desired destinations.


Tuesday, September 03, 2013

Lab Meeting Sep 5th 2013 (Tom Hsu): Efficient Dense 3D Rigid-Body Motion Segmentation in RGB-D Video

Title: Efficient Dense 3D Rigid-Body Motion Segmentation in RGB-D Video

Authors: Jörg Stückler, Sven Behnke

From: British Machine Vision Conference (BMVC), Bristol, UK, 2013

Motion is a fundamental segmentation cue in video. Many current approaches segment 3D motion in monocular or stereo image sequences, mostly relying on sparse interest points or being dense but computationally demanding. We propose an efficient expectation-maximization (EM) framework for dense 3D segmentation of moving rigid parts in RGB-D video. Our approach segments two images into pixel regions that undergo coherent 3D rigid-body motion. Our formulation treats background and foreground objects equally and poses no further assumptions on the motion of the camera or the objects than rigidness. While our EM-formulation is not restricted to a specific image representation, we supplement it with efficient image representation and registration for rapid segmentation of RGB-D video. In experiments we demonstrate that our approach recovers segmentation and 3D motion at good precision.

Tuesday, August 27, 2013

Lab Meeting, August 29, 2013 (Chiang Yi): Pose Estimation using Local Structure-Specific Shape and Appearance Context

Title: Pose Estimation using Local Structure-Specific Shape and Appearance Context

Authors: Anders Glent Buch, Dirk Kraft, Joni-Kristian Kamarainen, Henrik Gordon Petersen and Norbert Kr ̈uger

Abstract: We address the problem of estimating the alignment pose between two models using structure-specific local descriptors. Our descriptors are generated using a combination of 2D image data and 3D contextual shape data, resulting in a set of semi-local descriptors containing rich appearance and shape information for both edge and texture structures. This is achieved by defining feature space relations which describe the neighborhood of a descriptor. By quantitative evaluations, we show that our descriptors provide high discriminative power compared to state of the art approaches. In addition, we show how to utilize this for the estimation of the alignment pose between two point sets. We present experiments both in
controlled and real-life scenarios to validate our approach.

From: ICRA 2013


Monday, August 19, 2013

Lab Meeting, August 22, 2013 (Yen-Ting): Joint Optimization for Object Class Segmentation and Dense Stereo Reconstruction

Title: Joint Optimization for Object Class Segmentation and Dense Stereo Reconstruction

Authors: Lubor Ladický, Paul Sturgess, Chris Russell, Sunando Sengupta, Yalin Bastanlar, William Clocksin and Philip H.S. Torr

Abstract: The problems of dense stereo reconstruction and object class segmentation can both be formulated as Random Field labeling problems, in which every pixel in the image is assigned a label corresponding to either its disparity, or an object class such as road or building. While these two problems are mutually informative, no attempt has been made to jointly optimize their labelings. In this work we provide a flexible framework configured via cross-validation that unifies the two problems and demonstrate that, by resolving ambiguities, which would be present in real world data if the two problems were considered separately, joint optimization of the two problems substantially improves performance. To evaluate our method, we augment the Leuven data set (, which is a stereo video shot from a car driving around the streets of Leuven, with 70 hand labeled object class and disparity maps. We hope that the release of these annotations will stimulate further work in the challenging domain of street view analysis. Complete source code is publicly available (

From: International Journal of Computer Vision (IJCV), 2012

Link: Click here

Wednesday, August 07, 2013

Lab Meeting, August 8, 2013 (Channing): Multi-Robot System for Artistic Pattern Formation

Title: Multi-Robot System for Artistic Pattern Formation (ICRA 2011)
Authors: Javier Alonso-Mora, Andreas Breitenmoser, Martin Rufli, Roland Siegwart and Paul Beardsley

Abstract: This paper describes work on multi-robot pattern formation. Arbitrary target patterns are represented with an optimal robot deployment, using a method that is independent of the number of robots. Furthermore, the trajectories are visually appealing in the sense of being smooth, oscillation free, and showing fast convergence. A distributed controller guarantees collision free trajectories while taking into account the kinematics of differentially driven robots. Experimental results are provided for a representative set of patterns, for a swarm of up to ten physical robots, and for fifty virtual robots in simulation.

Paper Link: click here.

Wednesday, July 24, 2013

Lab meeting July 25th 2013 (Benny): Learning to segment and track in RGBD

Presented by: Benny
From: IEEE Transactions on Automation Science and Engineering 2013
Authors: Alex Teichman and Jake Lussier and Sebastian Thrun
Link: Paper
Abstract: We consider the problem of segmenting and tracking deformable objects in color video with depth (RGBD) data available from commodity sensors such as the Asus Xtion Pro Live or Microsoft Kinect. We frame this problem with very few assumptions - no prior object model, no stationary sensor, no prior 3D map - thus making a solution potentially useful for a large number of applications, including semi-supervised learning, 3D model capture, and object recognition.
Our approach makes use of a rich feature set, including local image appearance, depth discontinuities, optical flow, and surface normals to inform the segmentation decision in a conditional random field model. In contrast to previous work in this field, the proposed method learns how to best make use of these features from ground-truth segmented sequences. We provide qualitative and quantitative analyses which demonstrate substantial improvement over the state of the art.
This paper is an extended version of our previous work [29]. Building on this, we show that it is possible to achieve an order of magnitude speedup and thus real-time performance ( 20FPS) on a laptop computer by applying simple algorithmic optimizations to the original work. This speedup comes at only a minor cost in
overall accuracy and thus makes this approach applicable to a broader range of tasks. We demonstrate one such task: real-time, online, interactive segmentation to efficiently collect training data for an off-the-shelf object detector.

Tuesday, July 09, 2013

Lab Meeting July 10th, 2013 (Andi): Probabilistic Models for 3D Urban Scene Understanding from Movable Platforms

Title: Probabilistic Models for 3D Urban Scene Understanding from Movable Platforms
PhD Thesis, Andreas Geiger (KIT)

Visual 3D scene understanding is an important component in autonomous driving and robot navigation. Intelligent vehicles for example often base their decisions on observations obtained from video cameras as they are cheap and easy to employ. Inner-city intersections represent an interesting but also very challenging scenario in this context: The road layout may be very complex and observations are often noisy or even missing due to heavy occlusions. While Highway navigation (e.g., Dickmanns et al. [49]) and autonomous driving on simple and annotated intersections (e.g., DARPA Urban Challenge [30]) have already been demonstrated successfully, understanding and navigating general inner-city crossings with little prior knowledge remains an unsolved problem. This thesis is a contribution to understanding multi-object traffic scenes from video sequences. All data is provided by a camera system which is mounted on top of the autonomous driving platform AnnieWAY [103]. The proposed probabilistic generative model reasons jointly about the 3D scene layout as well as the 3D location and orientation of objects in the scene. In particular, the scene topology, geometry as well as traffic activities are inferred from short video sequences. The model takes advantage of monocular information in the form of vehicle tracklets, vanishing lines and semantic labels. Additionally, the benefit of stereo features such as 3D scene flow and occupancy grids is investigated.
Motivated by the impressive driving capabilities of humans, no further information such as GPS, lidar, radar or map knowledge is required. Experiments conducted on 113 representative intersection sequences show that the developed approach successfully infers the correct layout in a variety of difficult scenarios. To evaluate the importance of each feature cue, experiments with different feature combinations are conducted. Additionally, the proposed method is shown to improve object detection and object orientation estimation performance.

based primarily on the following two papers: (CVPR + NIPS '11)

Thursday, June 27, 2013

Lab Meeting July 3rd, 2013 (Jeff): Switchable Constraints vs. Max-Mixture Models vs. RRR - A Comparison of Three Approaches to Robust Pose Graph SLAM

Title: Switchable Constraints vs. Max-Mixture Models vs. RRR - A Comparison of Three Approaches to Robust Pose Graph SLAM

Authors: Niko Sünderhauf and Peter Protzel


SLAM algorithms that can infer a correct map despite the presence of outliers have recently attracted increasing attention. In the context of SLAM, outlier constraints are typically caused by a failed place recognition due to perceptional aliasing. If not handled correctly, they can have catastrophic effects on the inferred map. Since robust robotic mapping and SLAM are among the key requirements for autonomous long-term operation, inference methods that can cope with such data association failures are a hot topic in current research. Our paper compares three very recently published approaches to robust pose graph SLAM, namely switchable constraints, max-mixture models and the RRR algorithm. All three methods were developed as extensions to existing factor graph-based SLAM back-ends and aim at improving the overall system’s robustness to false positive loop closure constraints. Due to the novelty of the three proposed algorithms, no direct comparison has been conducted so far.

IEEE International Conference on Robotics and Automation (ICRA), 2013


Reference Link:
Switchable Constraints

Monday, June 17, 2013

Lab Meeting Jun. 19, 2013 (Alan) : Dense Variational Reconstruction of Non-Rigid Surfaces from Monocular Video

Title: Dense Variational Reconstruction of Non-Rigid Surfaces from Monocular Video (CVPR 2013 Oral)
Authors: Ravi Garg, Anastasios Roussos, Lourdes Agapito

This paper offers the first variational approach to the problem of dense 3D reconstruction of non-rigid surfaces from a monocular video sequence. We formulate nonrigid structure from motion (NRSfM) as a global variational energy minimization problem to estimate dense low-rank smooth 3D shapes for every frame along with the camera motion matrices, given dense 2D correspondences.
Unlike traditional factorization based approaches to NRSfM, which model the low-rank non-rigid shape using a fixed number of basis shapes and corresponding coefficients, we minimize the rank of the matrix of time-varying shapes directly via trace norm minimization. In conjunction with this low-rank constraint, we use an edge preserving total-variation regularization term to obtain spatially smooth shapes for every frame. Thanks to proximal splitting techniques the optimization problem can be decomposed into many point-wise sub-problems and simple linear systems which can be easily solved on GPU hardware. We show results on real sequences of different objects (face, torso, beating heart) where, despite challenges in tracking, illumination changes and occlusions, our method reconstructs highly deforming smooth surfaces densely and accurately directly from video, without the need for any prior models or shape templates.


Monday, May 27, 2013

Lab meeting May 29th 2013 (Jim): Reciprocal collision avoidance

I'm going to present the idea of "reciprocal collision avoidance": each moving agent
should take responsibilities for collision avoidance with each other during the navigation. Based on the model of velocity obstacles, the "reciprocal velocity obstacles" and its variations are developed for multi-agent navigation. The main references / materials are the following papers:

Reciprocal Velocity Obstacles for Real-time Multi-agent Navigation
Jur van den Berg, Ming C. Lin, Dinesh Manocha
IEEE International Conference on Robotics and Automation (ICRA), 2008


TheHybrid Reciprocal Velocity Obstacle
Jamie Snape, Jur van den Berg, Stephen J. Guy, Dinesh Manocha
IEEE Transactions on Robotics (T-RO), vol. 27, pp. 696-706, 2011

Reciprocaln-body Collision Avoidance
Jur van den Berg, Stephen J. Guy, Ming C. Lin, Dinesh Manocha
Robotics Research: The 14th International Symposium (ISRR), Springer Tracts in Advanced Robotics (STAR), vol. 70, pp. 3-19, 2011

Tuesday, May 21, 2013

Lab meeting May 22th 2013 (Tom Hsu): Incorporating User Interaction and Topological Constraints within Contour Completion via Discrete Calculus

Presented by: Tom Hsu

From: Proc. of the Computer Vision and Pattern Recognition (CVPR'13),  Portland, Oregon 2013.

Authors: Jia Xu Maxwell D. Collins Vikas Singh (University of Wisconsin-Madison)

Link: Paper

We study the problem of interactive segmentation and contour completion for multiple objects. The form of constraints our model incorporates are those coming from user scribbles (interior or exterior constraints) as well as information regarding the topology of the 2-D space after partitioning (number of closed contours desired). We discuss how concepts from discrete calculus and a simple identity using the Euler characteristic of a planar graph can be utilized to derive a practical algorithm for this problem. We also present specialized branch and bound methods for the case of single contour completion under such constraints. On an extensive dataset of ~1000 images, our experiments suggest that a small amount of side knowledge can give strong improvements over fully unsupervised contour completion methods. We show that by interpreting user indications topologically, user effort is substantially reduced.

Monday, May 06, 2013

Lab meeting Mar 8th 2013 (Gene): Lost! Leveraging the Crowd for Probabilistic Visual Self-Localization

Presented by: Gene

From: CVPR2013

Authors: Marcus A. Brubaker, Andreas Geiger, Raquel Urtasun


In this paper we propose an affordable solution to selflocalization, which utilizes visual odometry and road maps as the only inputs. To this end, we present a probabilistic model as well as an efficient approximate inference algorithm, which is able to utilize distributed computation to meet the real-time requirements of autonomous systems. Because of the probabilistic nature of the model we are able to cope with uncertainty due to noisy visual odometry and inherent ambiguities in the map (e.g., in a Manhattan world). By exploiting freely available, community developed maps and visual odometry measurements, we are able to localize a vehicle up to 3m after only a few seconds of driving on maps which contain more than 2,150km of drivable roads.


Tuesday, April 23, 2013

Lab meeting Apr 24th 2013 (Hank Lin): Scene Parsing with Multiscale Feature Learning, Purity Trees, and Optimal Covers

Presented by: Hank Lin

From: Proc. of the International Conference on Machine Learning (ICML'12), Edinburgh, Scotland, 2012.

Authors: C. Farabet, C. Couprie, L. Najman, Y. LeCun

Link: Paper Video

     Scene parsing, or semantic segmentation, consists in la-
beling each pixel in an image with the category of the object
it belongs to. It is a challenging task that involves the simul-
taneous detection, segmentation and recognition of all the
objects in the image.
     The scene parsing method proposed here starts by com-
puting a tree of segments from a graph of pixel dissimilari-
ties. Simultaneously, a set of dense feature vectors is com-
puted which encodes regions of multiple sizes centered on
each pixel. The feature extractor is a multiscale convolu-
tional network trained from raw pixels. The feature vec-
tors associated with the segments covered by each node in
the tree are aggregated and fed to a classifier which pro-
duces an estimate of the distribution of object categories
contained in the segment. A subset of tree nodes that cover
the image are then selected so as to maximize the aver-
age “purity” of the class distributions, hence maximizing
the overall likelihood that each segment will contain a sin-
gle object. The convolutional network feature extractor is
trained end-to-end from raw pixels, alleviating the need for
engineered features. After training, the system is parameter
      The system yields record accuracies on the Stanford
Background Dataset (8 classes), the Sift Flow Dataset (33
classes) and the Barcelona Dataset (170 classes) while
being an order of magnitude faster than competing ap-
proaches, producing a 320 × 240 image labeling in less
than 1 second.

Wednesday, April 17, 2013

Lab meeting Apr 17th 2013 (Bang-Cheng Wang): Biped Walking Pattern Generation by using Preview Control of Zero-Moment Point

Presented by Bang-Cheng Wang

From Proceedings of the 2003 IEEE
International Conference on Robotics & Automation
Taipei, Taiwan, September 14-19, 2003.

Kensuke HARADA, Kazuhito YOKOI and Hirohisa HIRUKAWA

We introduce a new method of a biped walking pattern
generation by using a preview control of the zero moment
point (ZMP). First, the dynamics of a biped
robot is modeled as a running cart on a table which
gives a convenient representation to treat ZMP. After
reviewing conventional methods of ZMP based pattern
generation, we formalize the problem as the design of a
ZMP tracking servo controller. It is shown that we can
realize such controller by adopting the preview control
theory that uses the future reference. It is also shown
that a preview controller can be used to compensate
the ZMP error caused by the difference between a simple
model and the precise multibody model. The effectiveness
of the proposed method is demonstrated by a
simulation of walking on spiral stairs.


Tuesday, April 09, 2013

Lab Meeting April 10, 2013 (Jimmy): Geodesic Flow Kernel for Unsupervised Domain Adaptation

Title: Geodesic Flow Kernel for Unsupervised Domain Adaptation
Authors: Boqing Gong, Yuan Shi, Fei Sha, Kristen Grauman
In: CVPR2012

In real-world applications of visual recognition, many factors—such as pose, illumination, or image quality—can cause a significant mismatch between the source domain on which classifiers are trained and the target domain to which those classifiers are applied. As such, the classifiers often perform poorly on the target domain. Domain adaptation techniques aim to correct the mismatch. Existing approaches have concentrated on learning feature representations that are invariant across domains, and they often do not directly exploit low-dimensional structures that are intrinsic to many vision datasets. In this paper, we propose a new kernel-based method that takes advantage of such structures. Our geodesic flow kernel models domain shift by integrating an infinite number of subspaces that characterize changes in geometric and statistical properties from the source to the target domain. Our approach is computationally advantageous, automatically inferring important algorithmic parameters without requiring extensive crossvalidation or labeled data from either domain. We also introduce a metric that reliably measures the adaptability between a pair of source and target domains. For a given target domain and several source domains, the metric can be used to automatically select the optimal source domain to adapt and avoid less desirable ones. Empirical studies on standard datasets demonstrate the advantages of our approach over competing methods.


Wednesday, March 27, 2013

Lab Meeting, March 28, 2013 (Chiang Yi): Efficient Model-based 3D Tracking of Hand Articulations using Kinect (BMVC 2011)

Authors: Iason Oikonomidis, Nikolaos Kyriazis
 ,Antonis A. Argyros

We present a novel solution to the problem of recovering and tracking the 3D po-
sition, orientation and full articulation of a human hand from markerless visual obser-
vations obtained by a Kinect sensor. We treat this as an optimization problem, seeking
for the hand model parameters that minimize the discrepancy between the appearance
and 3D structure of hypothesized instances of a hand model and actual hand observa-
tions. This optimization problem is effectively solved using a variant of Particle Swarm
Optimization (PSO). The proposed method does not require special markers and/or a
complex image acquisition setup. Being model based, it provides continuous solutions
to the problem of tracking hand articulations. Extensive experiments with a prototype
GPU-based implementation of the proposed method demonstrate that accurate and ro-
bust 3D tracking of hand articulations can be achieved in near real-time (15Hz).


extended work: Tracking the articulated motion of two strongly interacting hands

Tuesday, March 19, 2013

Lab Meeting, March 21, 2013 (Yen-Ting): Extracting 3D Scene-Consistent Object Proposals and Depth from Stereo Images (ECCV 2012)

Authors: Michael Bleyer, Christoph Rhemann, and Carsten Rother

Abstract: This work combines two active areas of research in computer vision: unsupervised object extraction from a single image, and depth estimation from a stereo image pair. A recent, successful trend in unsupervised object extraction is to exploit so-called “3D scene-consistency”, that is enforcing that objects obey underlying physical constraints of the 3D scene, such as occupancy of 3D space and gravity of objects. Our main contribution is to introduce the concept of 3D scene-consistency into stereo matching. We show that this concept is beneficial for both tasks, object extraction and depth estimation. In particular, we demonstrate that our approach is able to create a large set of 3D scene-consistent object proposals, by varying e.g. the prior on the number of objects...


Thursday, March 14, 2013

Lab Meeting, March 14, 2013 (Channing): The Design of LEO: a 2D Bipedal Walking Robot for Online Autonomous Reinforcement Learning (IROS 2010)

Authors: Erik Schuitema, Martijn Wisse, Thijs Ramakers and Pieter Jonker

Abstract: Real robots demonstrating online Reinforcement Learning (RL) to learn new tasks are hard to find. The specific properties and limitations of real robots have a large impact on their suitability for RL experiments. In this work, we derive the main hardware and software requirements that a RL robot should fulfill, and present our biped robot LEO that was specifically designed to meet these requirements. We verify its aptitude in autonomous walking experiments using a pre-programmed controller. Although there is room
for improvement in the design, the robot was able to walk, fall and stand up without human intervention for 8 hours, during which it made over 43,000 footsteps.


Wednesday, March 13, 2013

Lab Meeting, March 7, 2013 (Benny):A Segmentation and Data Association Annotation System for Laser-based Multi-Target Tracking Evaluation

Author: Chien-Chen Weng, Chieh-Chih Wang and Jennifer Healey

Abstract—2D laser scanners are now widely used to accomplish robot perception tasks such as SLAM and multi-target tracking (MTT). While a number of SLAM benchmarking datasets are available, only a few works have discussed the issues of collecting multi-target tracking benchmarking datasets.
In this work, a segmentation and data association annotation system is proposed for evaluating multi-target tracking using 2D laser scanners. The proposed annotation system uses the existing MTT algorithm to generate initial annotation results and uses camera images as the strong hints to assist annotators to recognize moving objects in laser scans. The annotators can draw the object’s shape and future trajectory to automate segmentation and data association and reduce the annotation task loading. The user study results show that the performance of the proposed annotation system is superior in the V-measure vs. annotation speed tests and the false positive and false negative rates.

Wednesday, February 20, 2013

Lab meeting Feb. 21, 2013 (ChihChung) A Tensor-Based Algorithm for High-Order Graph Matching (PAMI 2010)

Authors: Olivier Duchenne, Francis Bach, In-So Kweon, and Jean Ponce

Abstract: This paper addresses the problem of establishing correspondences between two sets of visual features using higher-order constraints instead of the unary or pairwise ones used in classical methods. Concretely, the corresponding hypergraph matching problem is formulated as the maximization of a multi-linear objective function over all permutations of the features. This function is defined by a tensor representing the affinity between feature tuples. It is maximized using a generalization of spectral techniques where a relaxed problem is first solved by a multi-dimensional power method, and the solution is then projected onto the closest assignment matrix. The proposed approach has been implemented, and it is compared to state-of-the-art algorithms on both synthetic and real data.


Tuesday, January 22, 2013

Lab meeting Jan. 23, 2013 (Gene): Fully Distributed Scalable Smoothing and Mapping with Robust Multi-robot Data Association (IEEE 2012)

Title: Fully Distributed Scalable Smoothing and Mapping with Robust Multi-robot Data Association (IEEE 2012)
Authors: Alexander Chunningham, Kai M. Wurm, Wolfarm Burgard, and Frank Dellaert


In this paper we focus on the multi-robot perception problem, and present an experimentally validated end-to-end multi-robot mapping framework, enabling individual robots in a team to see beyond their individual sensor horizons. The inference part of our system is the DDF-SAM algorithm [1], which provides a decentralized communication and inference scheme, but did not address the crucial issue of data association.

One key contribution is a novel, RANSAC-based, approach for performing the between-robot data associations and initialization of relative frames of reference. We demonstrate this system with both data collected from real robot experiments, as well as in a large scale simulated experiment demonstrating the scalability of the proposed approach.


Tuesday, January 08, 2013

Lab meeting Jan 9th 2013 (Bang-Cheng Wang): Kicking a Ball – Modeling Complex Dynamic Motions for Humanoid Robots

Presented by Bang-Cheng Wang

From RoboCup 2010: Robot Soccer World Cup XIV, ser. Lecture Notes 
in Artificial Intelligence, E. Chown, A. Matsumoto, P. Pl¨oger, 
and J. R. del Solar, Eds. Springer, to appear in 2011.

Judith Müller, Tim Laue, and Thomas Röfer

Complex motions like kicking a ball into the goal are becoming
more important in RoboCup leagues such as the Standard Platform
League. Thus, there is a need for motion sequences that can be parameterized
and changed dynamically. This paper presents a motion engine
that translates motions into joint angles by using trajectories. These
motions are defined as a set of Bezier curves that can be changed online
to allow adjusting, for example, a kicking motion precisely to the actual
position of the ball. During the execution, motions are stabilized by
the combination of center of mass balancing and a gyro feedback-based
closed-loop PID controller.