Monocular 3D Pose Estimation and Tracking by Detection
Mykhaylo Andriluka
Stefan Roth
Bernt Schiele
Abstract
Automatic recovery of 3D human pose from monocular
image sequences is a challenging and important research
topic with numerous applications. Although current methods
are able to recover 3D pose for a single person in controlled
environments, they are severely challenged by realworld
scenarios, such as crowded street scenes. To address
this problem, we propose a three-stage process building on
a number of recent advances. The first stage obtains an initial
estimate of the 2D articulation and viewpoint of the person
from single frames. The second stage allows early data
association across frames based on tracking-by-detection. The third and
final stage uses those tracklet-based estimates as robust image
observations to reliably recover 3D pose. We demonstrate
state-of-the-art performance on the HumanEva II
benchmark, and also show the applicability of our approach
to articulated 3D tracking in realistic street conditions.
Paper Link
This Blog is maintained by the Robot Perception and Learning lab at CSIE, NTU, Taiwan. Our scientific interests are driven by the desire to build intelligent robots and computers, which are capable of servicing people more efficiently than equivalent manned systems in a wide variety of dynamic and unstructured environments.
Monday, September 27, 2010
Sunday, September 19, 2010
Lab Meeting September 20, 2010 (Kuen-Han): Scale Drift-Aware Large Scale Monocular SLAM (RSS 2010)
Title: Scale Drift-Aware Large Scale Monocular SLAM
Author: Hauke Strasdat, J.M.M. Montiel, Andrew J. Davison
Abstract—State of the art visual SLAM systems have recently
been presented which are capable of accurate, large-scale and
real-time performance, but most of these require stereo vision.
Important application areas in robotics and beyond open up
if similar performance can be demonstrated using monocular
vision, since a single camera will always be cheaper, more
compact and easier to calibrate than a multi-camera rig.
With high quality estimation, a single camera moving through
a static scene of course effectively provides its own stereo
geometry via frames distributed over time. However, a classic
issue with monocular visual SLAM is that due to the purely
projective nature of a single camera, motion estimates and map
structure can only be recovered up to scale. Without the known
inter-camera distance of a stereo rig to serve as an anchor, the
scale of locally constructed map portions and the corresponding
motion estimates is therefore liable to drift over time.
In this paper we describe a new near real-time visual SLAM
system which adopts the continuous keyframe optimisation approach
of the best current stereo systems, but accounts for
the additional challenges presented by monocular input. In
particular, we present a new pose-graph optimisation technique
which allows for the efficient correction of rotation, translation
and scale drift at loop closures. Especially, we describe the
Lie group of similarity transformations and its relation to the
corresponding Lie algebra. We also present in detail the system’s
new image processing front-end which is able accurately to track
hundreds of features per frame, and a filter-based approach
for feature initialisation within keyframe-based SLAM. Our
approach is proven via large-scale simulation and real-world
experiments where a camera completes large looped trajectories.
link
Author: Hauke Strasdat, J.M.M. Montiel, Andrew J. Davison
Abstract—State of the art visual SLAM systems have recently
been presented which are capable of accurate, large-scale and
real-time performance, but most of these require stereo vision.
Important application areas in robotics and beyond open up
if similar performance can be demonstrated using monocular
vision, since a single camera will always be cheaper, more
compact and easier to calibrate than a multi-camera rig.
With high quality estimation, a single camera moving through
a static scene of course effectively provides its own stereo
geometry via frames distributed over time. However, a classic
issue with monocular visual SLAM is that due to the purely
projective nature of a single camera, motion estimates and map
structure can only be recovered up to scale. Without the known
inter-camera distance of a stereo rig to serve as an anchor, the
scale of locally constructed map portions and the corresponding
motion estimates is therefore liable to drift over time.
In this paper we describe a new near real-time visual SLAM
system which adopts the continuous keyframe optimisation approach
of the best current stereo systems, but accounts for
the additional challenges presented by monocular input. In
particular, we present a new pose-graph optimisation technique
which allows for the efficient correction of rotation, translation
and scale drift at loop closures. Especially, we describe the
Lie group of similarity transformations and its relation to the
corresponding Lie algebra. We also present in detail the system’s
new image processing front-end which is able accurately to track
hundreds of features per frame, and a filter-based approach
for feature initialisation within keyframe-based SLAM. Our
approach is proven via large-scale simulation and real-world
experiments where a camera completes large looped trajectories.
link
Lab Meeting September 20, 2010 (Alan): Probabilistic Surveillance with Multiple Active Cameras (ICRA 2010)
Title: Probabilistic Surveillance with Multiple Active Cameras (ICRA 2010)
Authors: Eric Sommerlade and Ian Reid
Abstract:
In this work we present a consistent probabilistic approach to control multiple, but diverse pan-tilt-zoom cameras concertedly observing a scene. There are disparate goals to this control: the cameras are not only to react to objects moving about, arbitrating conflicting interests of target resolution and trajectory accuracy, they are also to anticipate the appearance of new targets.
We base our control function on maximisation of expected mutual information gain, which to our knowledge is novel to the field of computer vision in the context of multiple pan-tilt-zoom camera control. This information theoretic measure yields a utility for each goal and parameter setting, making the use of physical or computational resources comparable. Weighting this utility allows to prioritise certain objectives or targets in the control.
The resulting behaviours in typical situations for multicamera systems, such as camera hand-off, acquisition of closeups and scene exploration, are emergent but intuitive. We quantitatively show that without the need for hand crafted rules they address the given objectives.
Saturday, September 18, 2010
Monday, September 13, 2010
Lab Meeting September 13th, 2010(fish60): progress report
I will briefly show what I have done these days with the review of LEARCH algorithm.
Saturday, September 11, 2010
Lab Meeting September 13th, 2010(Gary): AAM based Face Tracking with Temporal Matching and Face Segmentation(CVPR 2010)
Title:
AAM based Face Tracking with Temporal Matching and Face Segmentation
Authors:
Mingcai Zhou, Lin Liang, Jian Sun, Yangsheng Wang
Abstract:
Active Appearance Model (AAM) based face tracking has
advantages of accurate alignment, high efficiency, and
effectiveness for handling face deformation. However, AAM
suffers from the generalization problem and has difficulties
in images with cluttered backgrounds. In this paper, we in-
troduce two novel constraints into AAM fitting to address
the above problems. We first introduce a temporal matching
constraint in AAM fitting. In the proposed fitting scheme,
the temporal matching enforces an inter-frame local ap-
pearance constraint between frames. The resulting model
takes advantage of temporal matching's good generalizabil-
ity, but does not suffer from the mismatched points. To make
AAM more stable for cluttered backgrounds, we introduce a
color-based face segmentation as a soft constraint. Both
constraints effectively improve the AAM tracker's perfor-
mance, as demonstrated with experiments on various chal-
lenging real-world videos.
link
AAM based Face Tracking with Temporal Matching and Face Segmentation
Authors:
Mingcai Zhou, Lin Liang, Jian Sun, Yangsheng Wang
Abstract:
Active Appearance Model (AAM) based face tracking has
advantages of accurate alignment, high efficiency, and
effectiveness for handling face deformation. However, AAM
suffers from the generalization problem and has difficulties
in images with cluttered backgrounds. In this paper, we in-
troduce two novel constraints into AAM fitting to address
the above problems. We first introduce a temporal matching
constraint in AAM fitting. In the proposed fitting scheme,
the temporal matching enforces an inter-frame local ap-
pearance constraint between frames. The resulting model
takes advantage of temporal matching's good generalizabil-
ity, but does not suffer from the mismatched points. To make
AAM more stable for cluttered backgrounds, we introduce a
color-based face segmentation as a soft constraint. Both
constraints effectively improve the AAM tracker's perfor-
mance, as demonstrated with experiments on various chal-
lenging real-world videos.
link
Wednesday, September 08, 2010
PhD Thesis Defense: David Silver [Learning Preference Models for Autonomous Mobile Robots in Complex Domains]
PhD Thesis Defense: David Silver
Learning Preference Models for Autonomous Mobile Robots in Complex Domains
Carnegie Mellon University
September 13, 2010, 12:30 p.m., NSH 1507
Abstract
Achieving robust and reliable autonomous operation even in complex unstructured environments is a central goal of field robotics. ...
This thesis presents the development and application of machine learning techniques that automate the construction and tuning of preference models within complex mobile robotic systems. Utilizing the framework of inverse optimal control, expert examples of robot behavior can be used to construct models that generalize demonstrated preferences and reproduce similar behavior. Novel learning from demonstration approaches are developed that offer the possibility of significantly reducing the amount of human interaction necessary to tune a system, while also improving its final performance. Techniques to account for the inevitability of noisy and imperfect demonstration are presented, along with additional methods for improving the efficiency of expert demonstration and feedback.
The effectiveness of these approaches is confirmed through application to several real world domains, such as the interpretation of static and dynamic perceptual data in unstructured environments and the learning of human driving styles and maneuver preferences. ... These experiments validate the potential applicability of the developed algorithms to a large variety of future mobile robotic systems.
Link
Learning Preference Models for Autonomous Mobile Robots in Complex Domains
Carnegie Mellon University
September 13, 2010, 12:30 p.m., NSH 1507
Abstract
Achieving robust and reliable autonomous operation even in complex unstructured environments is a central goal of field robotics. ...
This thesis presents the development and application of machine learning techniques that automate the construction and tuning of preference models within complex mobile robotic systems. Utilizing the framework of inverse optimal control, expert examples of robot behavior can be used to construct models that generalize demonstrated preferences and reproduce similar behavior. Novel learning from demonstration approaches are developed that offer the possibility of significantly reducing the amount of human interaction necessary to tune a system, while also improving its final performance. Techniques to account for the inevitability of noisy and imperfect demonstration are presented, along with additional methods for improving the efficiency of expert demonstration and feedback.
The effectiveness of these approaches is confirmed through application to several real world domains, such as the interpretation of static and dynamic perceptual data in unstructured environments and the learning of human driving styles and maneuver preferences. ... These experiments validate the potential applicability of the developed algorithms to a large variety of future mobile robotic systems.
Link
Monday, September 06, 2010
Lab Meeting September 7th, 2010 (Jimmy): Learning to Recognize Objects from Unseen Modalities
Title: Learning to Recognize Objects from Unseen Modalities
In ECCV2010
Authors: C. Mario Christoudias, Raquel Urtasun, Mathieu Salzmann and Trevor Darrell
Abstract
In this paper we investigate the problem of exploiting multiple sources of information for object recognition tasks when additional modalities that are not present in the labeled training set are available for inference. This scenario is common to many robotics sensing applications and is in contrast with the assumption made by existing approaches that require at least some labeled examples for each modality. To leverage the previously unseen features, we make use of the unlabeled data to learn a mapping from the existing modalities to the new ones. This allows us to predict the missing data for the labeled examples and exploit all modalities using multiple kernel learning. We demonstrate the e ectiveness of our approach on several multi-modal tasks including object recognition from multi-resolution imagery, grayscale and color images, as well as images and text. Our approach outperforms multiple kernel learning on the original modalities, as well as nearest-neighbor and bootstrapping schemes.
[pdf]
In ECCV2010
Authors: C. Mario Christoudias, Raquel Urtasun, Mathieu Salzmann and Trevor Darrell
Abstract
In this paper we investigate the problem of exploiting multiple sources of information for object recognition tasks when additional modalities that are not present in the labeled training set are available for inference. This scenario is common to many robotics sensing applications and is in contrast with the assumption made by existing approaches that require at least some labeled examples for each modality. To leverage the previously unseen features, we make use of the unlabeled data to learn a mapping from the existing modalities to the new ones. This allows us to predict the missing data for the labeled examples and exploit all modalities using multiple kernel learning. We demonstrate the e ectiveness of our approach on several multi-modal tasks including object recognition from multi-resolution imagery, grayscale and color images, as well as images and text. Our approach outperforms multiple kernel learning on the original modalities, as well as nearest-neighbor and bootstrapping schemes.
[pdf]
Sunday, September 05, 2010
Lab Meeting September 7th, 2010 (Will(柏崴)): Efficient Computation of Robust Low-Rank Matrix Approximations in the Presence of Missing Data using the L1 Norm (CVPR2010)
Title: Efficient Computation of Robust Low-Rank Matrix Approximations in the Presence of Missing Data using the L1 Norm
Authors: Anders Eriksson and Anton van den Hengel
Abstract:
The calculation of a low-rank approximation of a matrix is a fundamental operation in many computer vision applications. The workhorse of this class of problems has long been the Singular Value Decomposition. However, in the presence of missing data and outliers this method is not applicable, and unfortunately, this is often the case in practice.
In this paper we present a method for calculating the low-rank factorization of a matrix which minimizes the L1 norm in the presence of missing data. Our approach represents a generalization the Wiberg algorithm, one of the more convincing methods for factorization under the L2 norm. By utilizing the differentiability of linear programs, we can extend the underlying ideas behind this approach to include this class of L1 problems as well. We show that the proposed algorithm can be efficiently implemented using existing optimization software. We also provide preliminary experiments on synthetic as well as real world data with very convincing results.
[link]
Subscribe to:
Posts (Atom)