Robot Perception and Learning

Tuesday, April 15, 2014

Lab meeting April 17, 2014 (Yen-Ting): Dense correspondence and annotation systems

I will present several state-of-the-art annotation systems and their relationship with dense correspondences. I will try to compare them with my own work now.

Reference papers:

LabelMe video: Building a Video Database with Human Annotations (ICCV 2009)
link

Efficiently Scaling up Crowdsourced Video Annotation (IJCV 2013)
link

Human-Assisted Motion Annotation (CVPR 2008)
link

Annotation Propagation in Large Image Databases via Dense Image Correspondence (ECCV 2012)
link

Wednesday, April 09, 2014

Lab meeting April 10, 2014 (Channing): "CAPT: Concurrent assignment and planning of trajectories for multiple robots"

Title: CAPT: Concurrent assignment and planning of trajectories for multiple robots

Authors: Matthew Turpin, Nathan Michael, and Vijay Kumar
GRASP Laboratory, University of Pennsylvania, Philadelphia, USA

In: The International Journal of Robotics Research (IJRR), January 2014, 33: 98-112

Abstract:
In this paper, we consider the problem of concurrent assignment and planning of trajectories (which we denote CAPT) for a team of robots. This problem involves simultaneously addressing two challenges: (1) the combinatorially complex problem of finding a suitable assignment of robots to goal locations, and (2) the generation of collision-free, time parameterized trajectories for every robot. We consider the CAPT problem for unlabeled (interchangeable) robots and propose algorithmic solutions to two variations of the CAPT problem. The first algorithm, C-CAPT, is a provably correct, complete, centralized algorithm which guarantees collision-free optimal solutions to the CAPT problem in an obstacle-free environment. To achieve these strong claims, C-CAPT exploits the synergy obtained by combining the two subproblems of assignment and trajectory generation to provide computationally tractable solutions for large numbers of robots. We then propose a decentralized solution to the CAPT problem through d-CAPT, a decentralized algorithm that provides suboptimal results compared to C-CAPT . We illustrate the algorithms and resulting performance through simulation and experimentation.

Download link: http://ijr.sagepub.com/content/33/1/98.full.pdf

Related Media link: http://www.seas.upenn.edu/~mturpin/summary.html

Wednesday, March 19, 2014

Lab meeting Mar. 20, (Andi) Multiview Structure from Motion in Trajectory Space

Authors: Aamer Zaheer, Ijaz Akhter, Mohammad Haris Baig, Shabbir Marzban, Sohaib Khan

Abstract:
Most nonrigid objects exhibit temporal regularities in their deformations. Recently it was proposed that these regularities can be parameterized by assuming that the non-rigid structure lies in a small dimensional trajectory space. In this paper, we propose a factorization approach for 3D reconstruction from multiple static cameras under the compact trajectory subspace representation. Proposed factorization is analogous to rank-3 factorization of rigid structure from motion problem, in transformed space. The benefit of our approach is that the 3D trajectory basis can be directly learned from the image observations. This also allows us to impute missing observations and denoise tracking errors without explicit estimation of the 3D structure. In contrast to standard triangulation based methods which require points to be visible in at least two cameras, our approach can reconstruct points, which remain occluded even in all the cameras for quite a long time. This makes our solution especially suitable for occlusion handling in motion capture systems. We demonstrate robustness of our method on challenging real and synthetic scenarios.

In: Proceedings of the 13th International Conference on Computer Vision (ICCV), Barcelona, Spain, Nov 2011

download paper

Wednesday, March 12, 2014

Lab meeting Mar. 13, (ChihChung) Matching two scene images with large distance and view angle change.

In this reporting, I will present the recent state-of-art approaches for scene image matching tasks and then discuss several new ideas of mine.

The references are:

Algorithms:

Affine-invariant SIFT:
link1
link2
link3

ORSA(Optimized RANSAC):
link

Virtual-line descriptor:
link

1-point RANSAC:
link

Implementation:

Using MAV and google street map for visual localization:
link

Monday, March 03, 2014

Lab Meeting March 6th, 2014 (Jeff): Simultaneous Parameter Calibration, Localization, and Mapping

Title: Simultaneous Parameter Calibration, Localization, and Mapping

Authors: Rainer Kümmerle, Giorgio Grisetti, and Wolfram Burgard

Abstract:

The calibration parameters of a mobile robot play a substantial role in navigation tasks. Often these parameters are subject to variations that depend either on changes in the environment or on the load of the robot. In this paper, we propose an approach to simultaneously estimate a map of the environment, the position of the on-board sensors of the robot, and its kinematic parameters. Our method requires no prior knowledge about the environment and relies only on a rough initial guess of the parameters of the platform. The proposed approach estimates the parameters online and it is able to adapt to non-stationary changes of the configuration. We tested our approach in simulated environments and on a wide range of real-world data using different types of robotic platforms.

Advanced Robotics Vol.26, 2012

Link:

http://www.tandfonline.com/doi/full/10.1080/01691864.2012.728694

Reference Link:
Simultaneous Parameter Calibration, Localization, and Mapping for Robust Service Robotics.
ARSO2011.
http://europa.informatik.uni-freiburg.de/files/kuemmerle11arso.pdf
Simultaneous Calibration, Localization, and Mapping.
IROS2011.
http://ais.informatik.uni-freiburg.de/publications/papers/kuemmerle11iros.pdf?origin=publication_detail

Wednesday, February 26, 2014

Lab Meeting Feburary 27, 2014 (Jimmy): System-Level Performance Analysis for Bayesian Cooperative Positioning: From Global to Local

Title: System-Level Performance Analysis for Bayesian Cooperative Positioning: From Global to Local
Authors: Zhang Siwei, Ronald Raulefs, Armin Dammann, and Stephan Sand

In: IEEE International Conference on Indoor Positioning and Indoor Navigation 2013

Abstract
Cooperative positioning (CP) can be used either to calibrate the accumulated error from inertial navigation or as a stand-alone navigation system. Though intensive research has been conducted on CP, there is a need to further investigate the joint impact from the system level on the accuracy. We derive a posterior Cramer-Rao bound (PCRB) considering both the physical layer (PHY) signal structure and the asynchronous latency from the multiple access control layer (MAC). The PCRB shows an immediate relationship between the theoretical accuracy limit and the effective factors, e.g. geometry, node dynamic, latency, signal structure, power, etc. which is useful to assess a cooperative system. However, for a large-scale decentralized cooperation network, calculating the PCRB becomes difficult due to the high state dimension and the absence of global information. We propose an equivalent ranging variance (ERV) scheme which projects the neighbor's positioning uncertainty to the distance measurement inaccuracy. With this, the effect from the interaction among the mobile terminals (MTs), e.g. measurement and communication can be decoupled. We use the ERV to derive a local PCRB (L-PCRB) which approximates the PCRB locally at each MT with low complexity. We further propose combining the ERV and L-PCRB together to improve the precision of the Bayesian localization algorithms. Simulation with an L-PCRB-aided distributed particle filter (DPF) in two typical cooperative positioning scenarios show a significant improvement comparing with the non-cooperative or standard DPF.

[Link]

Thursday, February 20, 2014

Lab Meeting, February 20, 2014 (Jim): Learning monocular reactive uav control in cluttered natural environments

Title: Learning monocular reactive uav control in cluttered natural environments

Authors:
Stephane Ross, Narek Melik-Barkhudarov, Kumar Shaurya Shankar, Andreas Wendel, Debadeepta Dey, J. Andrew (Drew) Bagnell, and Martial Hebert

IEEE International Conference on Robotics and Automation, March, 2013.

Abstract:
... Unlike large vehicles, MAVs can only carry very light sensors, such as cameras, making autonomous navigation through obstacles much more challenging. In this paper, we describe a system that navigates a small quadrotor helicopter autonomously at low altitude through natural forest environments. Using only a single cheap camera to perceive the environment, we are able to maintain a constant velocity of up to 1.5m/s. Given a small set of human pilot demonstrations, we use recent state-of-theart imitation learning techniques to train a controller that can avoid trees by adapting the MAVs heading. We demonstrate the performance of our system in a more controlled environment indoors, and in real natural forest environments outdoors.

Link

Tuesday, February 11, 2014

Lab Meeting, February 13, 2014(Hung-Chih Lu): Zhaoyin Jiay, Andrew Gallaghery, Ashutosh Saxena "3D-Based Reasoning with Blocks, Support, and Stability." CVPR 2013

Title:

3D-Based Reasoning with Blocks, Support, and Stability

Author:

Zhaoyin Jiay, Andrew Gallaghery, Ashutosh Saxena.

Abstract:

3D volumetric reasoning is important for truly understanding a scene. Humans are able to both segment each
object in an image, and perceive a rich 3D interpretation of the scene, e.g., the space an object occupies, which objects support other objects, and which objects would, if moved, cause other objects to fall. We propose a new approach for parsing RGB-D images using 3D block units for volumetric reasoning. The algorithm fits image segments with 3D blocks, and iteratively evaluates the scene based on block interaction properties. We produce a 3D representation of the scene based on jointly optimizing over segmentations,
block fitting, supporting relations, and object stability. Our algorithm incorporates the intuition that a good 3D representation of the scene is the one that fits the data well, and is a stable, self-supporting (i.e., one that does not topple) arrangement of objects. We experiment on several datasets including controlled and real indoor scenarios. Results show that our stability-reasoning framework improves RGB-D segmentation and scene volumetric representation.

From
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2013

Link

Wednesday, January 22, 2014

Lab Meeting, January 23, 2014(Yun-Jun Shen):Delaitre, Vincent, et al. "Scene semantics from long-term observation of people." Computer Vision–ECCV 2012

Title:

Scene semantics from long-term observation of people

Author:
Delaitre, Vincent, David F. Fouhey, Ivan Laptev, Josef Sivic, Abhinav Gupta, and Alexei A. Efros.

Abstract:

Our everyday objects support various tasks and can be used by people for different purposes. While object classification is a widely studied topic in computer vision, recognition of object function, i.e., what people can do with an object and how they do it, is rarely addressed. In this paper we construct a functional object description with the aim to recognize objects by the way people interact with them. We describe scene objects (sofas, tables, chairs) by associated human poses and ob- ject appearance. Our model is learned discriminatively from automatically estimated body poses in many realistic scenes. In particular, we make use of time-lapse videos from YouTube providing a rich source of common human-object interactions and minimizing the effort of manual object annotation. We show how the models learned from human observations significantly improve object recognition and enable prediction of characteristic human poses in new scenes. Results are shown on a dataset of more than 400,000 frames obtained from 146 time-lapse videos of challenging and realistic indoor scenes.

From:

12th European Conference on Computer Vision

Link

Wednesday, January 15, 2014

Lab Meeting, January 16th, 2014 (Henry Lu): Jaeyong Sung, Colin Ponce, Bart Selman and Ashutosh Saxena. "Unstructured Human Activity Detection from RGBD Images" IEEE International Conference on Robotics and Automation (ICRA), 2012

Title:
Unstructured Human Activity Detection from RGBD Images
Authors:
Jaeyong Sung
Dept. of Comput. Sci., Cornell Univ., Ithaca, NY, USA Ponce, C. ; Selman, B. ; Saxena, A.
Abstracts:

Being able to detect and recognize human activities is essential for several applications, including personal assistive robotics. In this paper, we perform detection and recognition of unstructured humanactivity in unstructured environments. We use a RGBD sensor (Microsoft Kinect) as the input sensor, and compute a set of features based on human pose and motion, as well as based on image and point-cloud information. Our algorithm is based on a hierarchical maximum entropy Markov model (MEMM), which considers a person's activity as composed of a set of sub-activities. We infer the two-layered graph structure using a dynamic programming approach. We test our algorithm on detecting and recognizing twelve different activities performed by four people in different environments, such as a kitchen, a living room, an office, etc., and achieve good performance even when the person was not seen before in the training set.

From:

2012 IEEE International Conference on Robotics and Automation (ICRA)

Tuesday, January 07, 2014

Lab Meeting, January 9nd, 2014 (Zhi-qiang): Jiang Wang ; Zicheng Liu ; Ying Wu ; Junsong Yuan. "Mining actionlet ensemble for action recognition with depth cameras." IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2012

Title:
Mining actionlet ensemble for action recognition with depth cameras

Author:
Jiang Wang ; Zicheng Liu ; Ying Wu ; Junsong Yuan

Abstract:
Human action recognition is an important yet challenging task. The recently developed commodity depth sensors open up new possibilities of dealing with this problem but also present some unique challenges. The depth maps captured by the depth cameras are very noisy and the 3D positions of the tracked joints may be completely wrong if serious occlusions occur, which increases the intra-class variations in the actions. In this paper, an actionlet ensemble model is learnt to represent each action and to capture the intra-class variance. In addition, novel features that are suitable for depth data are proposed. They are robust to noise, invariant to translational and temporal misalignments, and capable of characterizing both the human motion and the human-object interactions. The proposed approach is evaluated on two challenging action recognition datasets captured by commodity depth cameras, and another dataset captured by a MoCap system. The experimental evaluations show that the proposed approach achieves superior performance to the state of the art algorithms.
From:
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2012
Link

Sunday, December 29, 2013

Lab Meeting, January 2nd, 2014 (Gene Chang): Zhou, Feng, and Fernando De la Torre. "Deformable Graph Matching." IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2013

Title:
Deformable Graph Matching

Author:
Feng Zhou Fernando and De la Torre
Robotics Institute, Carnegie Mellon University, Pittsburgh, PA 15213

Abstract:
Graph matching (GM) is a fundamental problem in computer science, and it has been successfully applied to many problems in computer vision. Although widely used, existing GM algorithms cannot incorporate global consistence among nodes, which is a natural constraint in computer vision problems. This paper proposes deformable graph matching (DGM), an extension of GM for matching graphs subject to global rigid and non-rigid geometric constraints. The key idea of this work is a new factorization of the pair-wise affinity matrix. This factorization decouples the affinity matrix into the local structure of each graph and the pair-wise affinity edges. Besides the ability to incorporate global geometric transformations, this factorization offers three more benefits. First, there is no need to compute the costly (in space and time) pair-wise affinity matrix. Second, it provides a unified view of many GM methods and extends the standard iterative closest point algorithm. Third, it allows to use the path-following optimization algorithm that leads to improved optimization strategies and matching performance. Experimental results on synthetic and real databases illustrate how DGM outperforms state-of-the-art algorithms for GM. The code is available at http://humansensing.cs.cmu.edu/fgm.

From:
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2013

Link

Thursday, December 26, 2013

Lab Meeting, December 26, 2013 (Tom Hsu): An Eﬃcient Motion Segmentation Algorithm for Multibody RGB-D SLAM, (Proceedings of Australasian Conference on Robotics and Automation, 2-4 Dec 2013)

Title:
An Efficient Motion Segmentation Algorithm for Multibody RGB-D SLAM

Author:
Youbing Wang, Shoudong Huang
Faculty of Engineering and IT, University of Technology, Sydney, Australia

Abstract:
A simple motion segmentation algorithm using only two frames of RGB-D data is proposed, and both simulational and experimental segmentation results show its efficiency and reliability. To further verify its usability in multi-body SLAM scenarios, we firstly apply it to a simulated typical multi-body SLAM problem
with only a RGB-D camera, and then utilize it to segment a real RGB-D dataset collected by ourselves. Based on the good results of our motion segmentation algorithm, we can get satisfactory SLAM results for the simulated problem and the segmentation results using real data also enable us to get visual odometry for each
motion group thus facilitate the following steps to solve the practical multi-body RGB-D SLAM problems.

From:
Proceedings of Australasian Conference on Robotics and Automation, 2-4 Dec 2013, University of New South Wales, Sydney Australia

Link:
paper

Monday, December 16, 2013

Lab Meeting, December 19, 2013 (Yen-Ting): Deformable Spatial Pyramid Matching for Fast Dense Correspondences

Title: Deformable Spatial Pyramid Matching for Fast Dense Correspondences

Authors: Jaechul Kim, Ce Liu, Fei Sha and Kristen Grauman

Abstract: We introduce a fast deformable spatial pyramid (DSP) matching algorithm for computing dense pixel correspondences. Dense matching methods typically enforce both appearance agreement between matched pixels as well as geometric smoothness between neighboring pixels. Whereas the prevailing approaches operate at the pixel level, we propose a pyramid graph model that simultaneously regularizes match consistency at multiple spatial extents—ranging from an entire image, to coarse grid cells, to every single pixel. This novel regularization substantially improves pixel-level matching in the face of challenging image variations, while the “deformable” aspect of our model overcomes the strict rigidity of traditional spatial pyramids. Results on LabelMe and Caltech show our approach outperforms state-of-the-art methods (SIFT Flow [15] and PatchMatch [2]), both in terms of accuracy and run time.

P.S.
[2] C. Barnes, E. Shechtman, D. Goldman, and A. Finkelstein. The Generalized PatchMatch Correspondence Algorithm. In ECCV, 2010.
[15] C. Liu, J. Yuen, and A. Torralba. SIFT Flow: Dense Correspondence across Different Scenes and Its Applications. PAMI, 33(5), 2011.

From: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2013

Link: Click here

Wednesday, November 27, 2013

Lab Meeting Nov. 28, (Yi) Coherent Motion Segmentation in Moving Camera Videos using Optical Flow Orientations

Authors: Manjunath Narayana, Allen Hanson, Erik Learned-Miller

Abstract

In moving camera videos, motion segmentation is com-monly performed using the image plane motion of pixels, or optical flow. However, objects that are at different depths from the camera can exhibit different optical flows even if they share the same real-world motion. This can cause a depth-dependent segmentation of the scene. Our goal is to develop a segmentation algorithm that clusters pixels that have similar real-world motion irrespective of their depth
in the scene. Our solution uses optical flow orientations in-stead of the complete vectors and exploits the well-known property that under camera translation, optical flow ori-entations are independent of object depth. We introduce a probabilistic model that automatically estimates the number of observed independent motions and results in a labeling that is consistent with real-world motion in the scene. The result of our system is that static objects are correctly identified as one segment, even if they are at different depths. Color features and information from previous frames in the video sequence are used to correct occasional errors due
to the orientation-based segmentation. We present results on more than thirty videos from different benchmarks. The system is particularly robust on complex background scenes containing objects at significantly different depths.

link

Wednesday, November 20, 2013

Lab Meeting Nov. 21, (Channing) Where Do I Look Now? Gaze Allocation During Visually Guided Manipulation (ICRA 2012)

Title: Where Do I Look Now? Gaze Allocation During Visually Guided Manipulation (ICRA 2012)
Authors: Jose Nunez-Varela, B. Ravindran, Jeremy L.Wyatt

ABSTRACT - In this work we present principled methods for the coordination of a robot's oculomotor system with the rest of its body motor systems. The problem is to decide which physical actions to perform next and where the robot's gaze should be directed in order to gain information that is relevant to the success of its physical actions. Previous work on this problem has shown that a reward-based coordination mechanism provides an efficient solution. However, that approach does not allow the robot to move its gaze to different parts of the scene, it considers the robot to have only one motor system, and assumes that the actions have the same duration. The main contributions of our work are to extend that previous reward-based approach by making decisions about where to fixate the robot's gaze, handling multiple motor systems, and handling actions of variable duration. We compare our approach against two common baselines: random and round robin gaze allocation. We show how our method provides a more effective strategy to allocate gaze where it is needed the most.

Link

The Extension of the above work:

Title: Gaze Allocation Analysis for a Visually Guided Manipulation Task (SAB 2012)

Authors: Jose Nunez-Varela, B. Ravindran, Jeremy L.Wyatt

ABSTRACT - Findings from eye movement research in humans have demonstrated that the task determines where to look. One hypothesis is that the purpose of looking is to reduce uncertainty about properties relevant to the task. Following this hypothesis, we de ne a model that poses the problem of where to look as one of maximising task performance by reducing task relevant uncertainty. We implement and test our model on a simulated humanoid robot which has to move objects from a table into containers. Our model outperforms and is more robust than two other baseline schemes in terms of task performance whilst varying three environmental conditions, reach/grasp sensitivity, observation noise and the camera's field of view.

Link

Thursday, November 14, 2013

Lab meeting Nov.14,(Benny) Detection- and Trajectory-Level Exclusion in Multiple Object Tracking (CVPR2013)

Authors: Anton Milan, Konrad Schindler, Stefan Roth

Abstract
When tracking multiple targets in crowded scenarios, modeling mutual exclusion between distinct targets becomes important at two levels: (1) in data association, each target observation should support at most one trajectory and each trajectory should be assigned at most one observation per frame; (2) in trajectory estimation, two trajectories should remain spatially separated at all times to avoid collisions. Yet, existing trackers often sidestep these important constraints. We address this using a mixed discrete-continuous conditional random field (CRF) that explicitly models both types of constraints: Exclusion between conflicting observations with supermodular pairwise terms, and exclusion between trajectories by generalizing global label costs to suppress the co-occurrence of incompatible labels (trajectories). We develop an expansion move-based MAP estimation scheme that handles both non-submodular constraints and pairwise global label costs. Furthermore, we perform a statistical analysis of ground-truth trajectories to derive appropriate CRF potentials for modeling data fidelity, target dynamics, and inter-target occlusion.

link

Wednesday, November 06, 2013

Lab meeting Nov.7, (ChihChung) MAV Urban Localization from Google Street View Data (IROS2013)

Authors: Andr´as L. Majdik, Yves Albers-Schoenberg, Davide Scaramuzza

Abstract—We tackle the problem of globally localizing a
camera-equipped micro aerial vehicle flying within urban environments
for which a Google Street View image database
exists. To avoid the caveats of current image-search algorithms
in case of severe viewpoint changes between the query and the
database images, we propose to generate virtual views of the
scene, which exploit the air-ground geometry of the system.
To limit the computational complexity of the algorithm, we
rely on a histogram-voting scheme to select the best putative
image correspondences. The proposed approach is tested on a
2km image dataset captured with a small quadroctopter flying
in the streets of Zurich. The success of our approach shows
that our new air-ground matching algorithm can robustly handle
extreme changes in viewpoint, illumination, perceptual aliasing,
and over-season variations, thus, outperforming conventional
visual place-recognition approaches.

[link]

Wednesday, October 30, 2013

Lab meeting Oct.31, (Andi) Non-rigid metric reconstruction from perspective cameras (IVCJ 2010)

Title: Non-rigid metric reconstruction from perspective cameras

Authors: Xavier Lladó, Alessio Del Bue, Lourdes Agapito

Abstract: The metric reconstruction of a non-rigid object viewed by a generic camera poses new challenges since current approaches for Structure from Motion assume the rigidity constraint of a shape as an essential condition. In this work, we focus on the estimation of the 3-D Euclidean shape and motion of a non-rigid shape observed by a perspective camera. In such case deformation and perspective effects are difficult to decouple – the parametrization of the 3-D non-rigid body may mistakenly account for the perspective distortion. Our method relies on the fact that it is often a reasonable assumption that some of the points on the object’s surface are deforming throughout the sequence while others remain rigid. Thus, relying on the rigidity constraints of a subset of rigid points, we estimate the perspective to metric upgrade trans- formation. First, we use an automatic segmentation algorithm to identify the set of rigid points. These are then used to estimate the internal camera calibration parameters and the overall rigid motion. Finally, we formulate the problem of non-rigid shape and motion estimation as a non-linear optimization where the objective function to be minimized is the image reprojection error. The prior information that some of the points in the object are rigid can also be added as a constraint to the non-linear minimization scheme in order to avoid ambiguous configurations. We perform experiments on different synthetic and real data sets which show that even when using a minimal set of rigid points and when varying the intrinsic cam- era parameters it is possible to obtain reliable metric information.

Link

Wednesday, October 23, 2013

Lab Meeting Oct. 24, 2013 (Alan): Optimal Metric Projections for Deformable and Articulated Structure-from-Motion (IJCV 2012)

Title: Optimal Metric Projections for Deformable and Articulated Structure-from-Motion (IJCV 2012)
Authors: Marco Paladini, Alessio Del Bue, João Xavier, Lourdes Agapito, Marko Stoši´c, Marija Dodig

Abstract:
This paper describes novel algorithms for recovering the 3D shape and motion of deformable and articulated objects purely from uncalibrated 2D image measurements using a factorisation approach. Most approaches to deformable and articulated structure from motion require to upgrade an initial affine solution to Euclidean space by imposing metric constraints on the motion matrix. While in the case of rigid structure the metric upgrade step is simple since the constraints can be formulated as linear, deformability in the shape introduces non-linearities. In this paper we propose an alternating bilinear approach to solve for non-rigid 3D shape and motion, associated with a globally optimal projection step of the motion matrices onto the manifold of metric constraints. Our novel optimal projection step combines into a single optimisation the computation of the orthographic projection matrix and the configuration weights that give the closest motion matrix that satisfies the correct block structure with the additional constraint that the projection matrix is guaranteed to have orthonormal rows (i.e. its transpose lies on the Stiefel manifold). This constraint turns out to be non-convex. The key contribution of this work is to introduce an efficient convex relaxation for the non-convex projection step. Efficient in the sense that, for both the cases of deformable and articulated motion, the proposed relaxations turned out to be exact (i.e. tight) in all our numerical experiments. The convex relaxations are semi-definite (SDP) or second-order cone (SOCP) programs which can be readily tackled by popular solvers. An important advantage of these new algorithms is their ability to handle missing data which becomes crucial when dealing with real video sequences with self-occlusions. We show successful results of our algorithms on synthetic and real sequences of both deformable and articulated data. We also show comparative results with state of the art algorithms which reveal that our new methods outperform existing ones.

Link