Robot Perception and Learning

Wednesday, July 23, 2014

Lab Meeting July 24th, 2014 (Jeff): Combining 3D Shape, Color, and Motion for Robust Anytime Tracking

Title: Combining 3D Shape, Color, and Motion for Robust Anytime Tracking

Authors: David Held, Jesse Levinson, Sebastian Thrun, and Silvio Savarese

Abstract:

Although object tracking has been studied for decades, real-time tracking algorithms often suffer from low accuracy and poor robustness when confronted with difficult, real world data. We present a tracker that combines 3D shape, color(when available), and motion cues to accurately track moving objects in real-time. Our tracker allocates computational effort based on the shape of the posterior distribution. Starting with a coarse approximation to the posterior, the tracker successively refines this distribution, increasing in tracking accuracy over time. The tracker can thus be run for any amount of time, after which the current approximation to the posterior is returned. Even at a minimum runtime of 0.7 milliseconds, our method outperforms all of the baseline methods of similar speed by at least 10%. If our tracker is allowed to run for longer, the accuracy continues to improve, and it continues to outperform all baseline methods. Our tracker is thus anytime, allowing the speed or accuracy to be optimized based on the needs of the application.

Robotics: Science and Systems (RSS), 2014

Link:

http://www.roboticsproceedings.org/rss10/p14.html
http://www.roboticsproceedings.org/rss10/p14.pdf

Wednesday, June 25, 2014

Lab Meeting, Jun 25, 2014 (Jimmy): Belief Propagation Based Localization and Mapping Using Sparsely Sampled GNSS SNR Measurements

Title: Belief Propagation Based Localization and Mapping Using Sparsely Sampled GNSS SNR Measurements

In: ICRA 2014

Authors: Andrew T. Irish, Jason T. Isaacs, Francois Quitin, Joao P. Hespanha, and Upamanyu Madhow

Abstract
A novel approach is proposed to achieve simultaneous localization and mapping (SLAM) based on the signal-tonoise ratio (SNR) of global navigation satellite system (GNSS) signals. It is assumed that the environment is unknown and that the receiver location measurements (provided by a GNSS receiver) are noisy. The 3D environment map is decomposed into a grid of binary-state cells (occupancy grid) and the receiver locations are approximated by sets of particles. Using a large number of sparsely sampled GNSS SNR measurements and receiver/satellite coordinates (all available from off-the-shelf GNSS receivers), likelihoods of blockage are associated with every receiver-to-satellite beam. The posterior distribution of the map and poses is shown to represent a factor graph, on which Loopy Belief Propagation is used to efficiently estimate the probabilities of each cell being occupied or empty, along with the probability of the particles for each receiver location. Experimental results demonstrate our algorithm’s ability to coarsely map (in three dimensions) a corner of a university campus, while also correcting for uncertainties in the location of the GNSS receiver.

Link

Tuesday, June 17, 2014

Lab Meeting, Jun 19, 2014 (Jim): Bayesian Exploration and Interactive Demonstration in Continuous State MAXQ-Learning

Title:
Bayesian Exploration and Interactive Demonstration in Continuous State MAXQ-Learning

IEEE International Conference on Robotics and Automation, May, 2014.

Author:
Kathrin Gräve and Sven Behnke

Abstract:
... Inspired by the way humans decompose complex tasks, hierarchical methods for robot learning have attracted significant interest. In this paper, we apply the MAXQ method for hierarchical reinforcement learning to continuous state spaces. By using Gaussian Process Regression for MAXQ value function decomposition, we obtain probabilistic estimates of primitive and completion values for every subtask within the MAXQ hierarchy. ... Based on the expected deviation of these estimates, we devise a Bayesian exploration strategy that balances optimization of expected values and risk from exploring unknown actions. To further reduce risk and to accelerate learning, we complement MAXQ with learning from demonstrations in an interactive way. In every situation and subtask, the system may ask for a demonstration if there is not enough knowledge available to determine a safe action for exploration. We demonstrate the ability of the proposed system to efficiently learn solutions to complex tasks on a box stacking scenario.

Link

Wednesday, June 11, 2014

Lab Meeting, Jun 12, 2014 (Zhi-qiang): Simon Hadfield, Member, IEEE; Richard Bowden, Senior Member, IEEE. "Scene Particles: Unregularized Particle Based Scene Flow Estimation" IEEE TRANSACTIONS PATTERN ANALYSIS AND MACHINE INTELLIGENCE (PAMI), 2014

Title:
Scene Particles: Unregularized Particle Based Scene Flow Estimation

Author:
Simon Hadfield; Richard Bowden

Abstract

In this paper, an algorithm is presented for estimating scene flow, which is a richer, 3D analogue of Optical Flow. The approach operates orders of magnitude faster than alternative techniques, and is well suited to further performance gains through parallelized implementation. The algorithm employs multiple hypothesis to deal with motion ambiguities, rather than the traditional smoothness constraints, removing oversmoothing errors and providing significant performance improvements on benchmark data, over the previous state of the art. The approach is flexible, and capable of operating with any combination of appearance and/or depth sensors, in any setup, simultaneously estimating the structure and motion if necessary. Additionally, the algorithm propagates information over time to resolve ambiguities, rather than performing an isolated estimation at each frame, as in contemporary approaches. Approaches to smoothing the motion field without sacrificing the benefits of multiple hypotheses are explored, and a probabilistic approach to Occlusion estimation is demonstrated, leading to 10% and 15% improved performance respectively. Finally, a data driven tracking approach is described, and used to estimate the 3D trajectories of hands during sign language, without the need to model complex appearance variations at each viewpoint.

From:

IEEE Pattern Analysis and Machine Intelligence (PAMI), 2014
Link: http://personal.ee.surrey.ac.uk/Personal/S.Hadfield/papers/Scene%20particles.pdf

Wednesday, June 04, 2014

Lab meeting Jun 5, 2014 (Hung-Chih Lu): Robust Online Multi-Object Tracking based on Tracklet Confidence and Online Discriminative Appearance Learning

Title: Robust Online Multi-Object Tracking based on Tracklet Confidence and Online Discriminative Appearance Learning

Authors: Seung-Hwan Bae and Kuk-Jin Yoon

Abstract:
Online multi-object tracking aims at producing complete tracks of multiple objects using the information accumulated up to the present moment. It still remains a difficult problem in complex scenes, because of frequent occlusion by clutter or other objects, similar appearances of different objects, and other factors. In this paper, we propose a robust online multi-object tracking method that can handle
these difficulties effectively.
We first propose the tracklet confidence using the detectability and continuity of a tracklet, and formulate a
multi-object tracking problem based on the tracklet confidence.The multi-object tracking problem is then solved by associating tracklets in different ways according to their confidence values. Based on this strategy, tracklets sequentially grow with online-provided detections, and fragmented tracklets are linked up with others without any iterative and expensive associations. Here, for reliable association between tracklets and detections, we also propose a novel online learning method using an incremental linear discriminant
analysis for discriminating the appearances of objects. By exploiting the proposed learning method, tracklet association can be successfully achieved even under severe occlusion. Experiments with challenging public datasets show distinct performance improvement over other batch and online tracking methods.
CVPR 2014
Link

Wednesday, May 28, 2014

Lab meeting May 29, 2014 (Kung-Hung Lu): Finding Group Interactions in Social Clutter

Title: Finding Group Interactions in Social Clutter

Authors: Ruonan Li, Parker Porfilio, Todd Zickler

Abstract:

We consider the problem of finding distinctive social interactions involving groups of agents embedded in larger social gatherings. Given a pre-defined gallery of short exemplar interaction videos, and a long input video of a large gathering (with approximately-tracked agents), we identify within the gathering small sub-groups of agents exhibiting social interactions that resemble those in the exemplars. The participants of each detected group interaction are localized in space; the extent of their interaction is localized in time; and when the gallery of exemplars is annotated with group-interaction categories, each detected interaction is classified into one of the pre-defined categories. Our approach represents group behaviors by dichotomous collections

of descriptors for (a) individual actions, and (b) pairwise interactions; and it includes efficient algorithms for

optimally distinguishing participants from by-standers in every temporal unit and for temporally localizing the extent of the group interaction. Most importantly, the method is generic and can be applied whenever numerous interacting agents can be approximately tracked over time. We evaluate the approach using three different video collections, two that involve humans and one that involves mice.

In: Computer Vision and Pattern Recognition(CVPR), 2013 IEEE Conference on. IEEE, 2013

Link: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6619195

Wednesday, May 21, 2014

Lab meeting May 22, 2014 (Chun-Kai Chang): Communication Adaptive Multi-Robot Simultaneous Localization and Tracking via Hybrid Measurement and Belief Sharing

Title: Communication Adaptive Multi-Robot Simultaneous Localization and Tracking via Hybrid Measurement and Belief Sharing

Authors: Chun-Kai Chang, Chun-Hua Chang and Chieh-Chih Wang

Abstract:Existing multi-robot cooperative perception solutions can be mainly classified into two categories, measurement-based and belief-based, according to the information shared among robots. With well-controlled communication, measurement-based approaches are expected to achieve theoretically optimal estimates while belief-based approaches are not because the cross-correlations between beliefs are hard to be perfectly estimated in practice. Nevertheless, belief-based approaches perform relatively stable under unstable communication as a belief contains the information of multiple previous measurements. Motivated by the observation that measurement sharing and belief sharing are respectively superior in different conditions, in this paper a hybrid algorithm, communication adaptive multi-robot simultaneous localization and tracking (ComAd MR-SLAT), is proposed to combine the advantages of both. To tackle the unknown or unstable communication conditions, the information to share is decided by maximizing the expected uncertainty reduction online, based on which the algorithm dynamically alternates between measurement sharing and belief-sharing without information loss or reuse. The proposed ComAd MR-SLAT is evaluated in communication conditions with different packet loss rates and bursty loss lengths. In our experiments, ComAd MR-SLAT outperforms measurement-based and belief-based MR-SLAT in accuracy. The experimental results demonstrate the effectiveness of the proposed hybrid algorithm and exhibit that ComAd MR-SLAT is robust under different communication conditions.

In: IEEE International Conference on Robotics and Automation, 2014.

Wednesday, May 14, 2014

Lab meeting May 15, 2014 (Yun-Jun Shen): Robust Monocular Epipolar Flow Estimation

Title: Robust Monocular Epipolar Flow Estimation

Authors: Koichiro Yamaguchi, David McAllester and Raquel Urtasun

Abstract:
We consider the problem of computing optical flow in monocular video taken from a moving vehicle. In this setting, the vast majority of image flow is due to the vehicle’s ego-motion. We propose to take advantage of this fact and estimate flow along the epipolar lines of the egomotion. Towards this goal, we derive a slanted-plane MRF model which explicitly reasons about the ordering of planes and their physical validity at junctions. Furthermore, we present a bottom-up grouping algorithm which produces over-segmentations that respect flow boundaries. We demonstrate the effectiveness of our approach in the challenging KITTI flow benchmark achieving half the error of the best competing general flow algorithm and one third of the error of the best epipolar flow algorithm.

In: Computer Vision and Pattern Recognition (CVPR), 2013 IEEE Conference on. IEEE, 2013

Link: http://ttic.uchicago.edu/~rurtasun/publications/yamaguchi_et_al_cvpr13.pdf

Tuesday, May 06, 2014

Lab meeting May 8, 2014 (Bang-Cheng Wang): Robust feedback control of ZMP-based gait for the humanoid robot Nao

Authors: J.J. Alcaraz-Jiménez, D. Herrero-Pérez and H. Martínez-Barberá

Abstract:
Numerous approaches have been proposed to generate well-balanced gaits in biped robots that show excellent performance in simulated environments. However, in general, the dynamic balance of the robots decreases dramatically when these methods are tested in physical platforms. Since humanoid robots are intended to collaborate with humans and operate in everyday environments, it is of paramount importance to test such approaches both in physical platforms and under severe conditions. In this work, the special characteristics of the Nao humanoid platform are analyzed and a control system that allows robust walking and disturbance rejection is proposed. This approach combines the zero moment point (ZMP) stability criterion with angular momentum suppression and step timing control. The proposed method is especially suitable for platforms with limited computational resources and sensory and sensory-motor capabilities.

In: The International Journal of Robotics Research (IJRR) August/September 2013 vol. 32

Link: http://ijr.sagepub.com/content/32/9-10/1074

Video: http://www.ijrr.org/ijrr_2013/487566.htm

Tuesday, April 15, 2014

Lab meeting April 17, 2014 (Yen-Ting): Dense correspondence and annotation systems

I will present several state-of-the-art annotation systems and their relationship with dense correspondences. I will try to compare them with my own work now.

Reference papers:

LabelMe video: Building a Video Database with Human Annotations (ICCV 2009)
link

Efficiently Scaling up Crowdsourced Video Annotation (IJCV 2013)
link

Human-Assisted Motion Annotation (CVPR 2008)
link

Annotation Propagation in Large Image Databases via Dense Image Correspondence (ECCV 2012)
link

Wednesday, April 09, 2014

Lab meeting April 10, 2014 (Channing): "CAPT: Concurrent assignment and planning of trajectories for multiple robots"

Title: CAPT: Concurrent assignment and planning of trajectories for multiple robots

Authors: Matthew Turpin, Nathan Michael, and Vijay Kumar
GRASP Laboratory, University of Pennsylvania, Philadelphia, USA

In: The International Journal of Robotics Research (IJRR), January 2014, 33: 98-112

Abstract:
In this paper, we consider the problem of concurrent assignment and planning of trajectories (which we denote CAPT) for a team of robots. This problem involves simultaneously addressing two challenges: (1) the combinatorially complex problem of finding a suitable assignment of robots to goal locations, and (2) the generation of collision-free, time parameterized trajectories for every robot. We consider the CAPT problem for unlabeled (interchangeable) robots and propose algorithmic solutions to two variations of the CAPT problem. The first algorithm, C-CAPT, is a provably correct, complete, centralized algorithm which guarantees collision-free optimal solutions to the CAPT problem in an obstacle-free environment. To achieve these strong claims, C-CAPT exploits the synergy obtained by combining the two subproblems of assignment and trajectory generation to provide computationally tractable solutions for large numbers of robots. We then propose a decentralized solution to the CAPT problem through d-CAPT, a decentralized algorithm that provides suboptimal results compared to C-CAPT . We illustrate the algorithms and resulting performance through simulation and experimentation.

Download link: http://ijr.sagepub.com/content/33/1/98.full.pdf

Related Media link: http://www.seas.upenn.edu/~mturpin/summary.html

Wednesday, March 19, 2014

Lab meeting Mar. 20, (Andi) Multiview Structure from Motion in Trajectory Space

Authors: Aamer Zaheer, Ijaz Akhter, Mohammad Haris Baig, Shabbir Marzban, Sohaib Khan

Abstract:
Most nonrigid objects exhibit temporal regularities in their deformations. Recently it was proposed that these regularities can be parameterized by assuming that the non-rigid structure lies in a small dimensional trajectory space. In this paper, we propose a factorization approach for 3D reconstruction from multiple static cameras under the compact trajectory subspace representation. Proposed factorization is analogous to rank-3 factorization of rigid structure from motion problem, in transformed space. The benefit of our approach is that the 3D trajectory basis can be directly learned from the image observations. This also allows us to impute missing observations and denoise tracking errors without explicit estimation of the 3D structure. In contrast to standard triangulation based methods which require points to be visible in at least two cameras, our approach can reconstruct points, which remain occluded even in all the cameras for quite a long time. This makes our solution especially suitable for occlusion handling in motion capture systems. We demonstrate robustness of our method on challenging real and synthetic scenarios.

In: Proceedings of the 13th International Conference on Computer Vision (ICCV), Barcelona, Spain, Nov 2011

download paper

Wednesday, March 12, 2014

Lab meeting Mar. 13, (ChihChung) Matching two scene images with large distance and view angle change.

In this reporting, I will present the recent state-of-art approaches for scene image matching tasks and then discuss several new ideas of mine.

The references are:

Algorithms:

Affine-invariant SIFT:
link1
link2
link3

ORSA(Optimized RANSAC):
link

Virtual-line descriptor:
link

1-point RANSAC:
link

Implementation:

Using MAV and google street map for visual localization:
link

Monday, March 03, 2014

Lab Meeting March 6th, 2014 (Jeff): Simultaneous Parameter Calibration, Localization, and Mapping

Title: Simultaneous Parameter Calibration, Localization, and Mapping

Authors: Rainer Kümmerle, Giorgio Grisetti, and Wolfram Burgard

Abstract:

The calibration parameters of a mobile robot play a substantial role in navigation tasks. Often these parameters are subject to variations that depend either on changes in the environment or on the load of the robot. In this paper, we propose an approach to simultaneously estimate a map of the environment, the position of the on-board sensors of the robot, and its kinematic parameters. Our method requires no prior knowledge about the environment and relies only on a rough initial guess of the parameters of the platform. The proposed approach estimates the parameters online and it is able to adapt to non-stationary changes of the configuration. We tested our approach in simulated environments and on a wide range of real-world data using different types of robotic platforms.

Advanced Robotics Vol.26, 2012

Link:

http://www.tandfonline.com/doi/full/10.1080/01691864.2012.728694

Reference Link:
Simultaneous Parameter Calibration, Localization, and Mapping for Robust Service Robotics.
ARSO2011.
http://europa.informatik.uni-freiburg.de/files/kuemmerle11arso.pdf
Simultaneous Calibration, Localization, and Mapping.
IROS2011.
http://ais.informatik.uni-freiburg.de/publications/papers/kuemmerle11iros.pdf?origin=publication_detail

Wednesday, February 26, 2014

Lab Meeting Feburary 27, 2014 (Jimmy): System-Level Performance Analysis for Bayesian Cooperative Positioning: From Global to Local

Title: System-Level Performance Analysis for Bayesian Cooperative Positioning: From Global to Local
Authors: Zhang Siwei, Ronald Raulefs, Armin Dammann, and Stephan Sand

In: IEEE International Conference on Indoor Positioning and Indoor Navigation 2013

Abstract
Cooperative positioning (CP) can be used either to calibrate the accumulated error from inertial navigation or as a stand-alone navigation system. Though intensive research has been conducted on CP, there is a need to further investigate the joint impact from the system level on the accuracy. We derive a posterior Cramer-Rao bound (PCRB) considering both the physical layer (PHY) signal structure and the asynchronous latency from the multiple access control layer (MAC). The PCRB shows an immediate relationship between the theoretical accuracy limit and the effective factors, e.g. geometry, node dynamic, latency, signal structure, power, etc. which is useful to assess a cooperative system. However, for a large-scale decentralized cooperation network, calculating the PCRB becomes difficult due to the high state dimension and the absence of global information. We propose an equivalent ranging variance (ERV) scheme which projects the neighbor's positioning uncertainty to the distance measurement inaccuracy. With this, the effect from the interaction among the mobile terminals (MTs), e.g. measurement and communication can be decoupled. We use the ERV to derive a local PCRB (L-PCRB) which approximates the PCRB locally at each MT with low complexity. We further propose combining the ERV and L-PCRB together to improve the precision of the Bayesian localization algorithms. Simulation with an L-PCRB-aided distributed particle filter (DPF) in two typical cooperative positioning scenarios show a significant improvement comparing with the non-cooperative or standard DPF.

[Link]

Thursday, February 20, 2014

Lab Meeting, February 20, 2014 (Jim): Learning monocular reactive uav control in cluttered natural environments

Title: Learning monocular reactive uav control in cluttered natural environments

Authors:
Stephane Ross, Narek Melik-Barkhudarov, Kumar Shaurya Shankar, Andreas Wendel, Debadeepta Dey, J. Andrew (Drew) Bagnell, and Martial Hebert

IEEE International Conference on Robotics and Automation, March, 2013.

Abstract:
... Unlike large vehicles, MAVs can only carry very light sensors, such as cameras, making autonomous navigation through obstacles much more challenging. In this paper, we describe a system that navigates a small quadrotor helicopter autonomously at low altitude through natural forest environments. Using only a single cheap camera to perceive the environment, we are able to maintain a constant velocity of up to 1.5m/s. Given a small set of human pilot demonstrations, we use recent state-of-theart imitation learning techniques to train a controller that can avoid trees by adapting the MAVs heading. We demonstrate the performance of our system in a more controlled environment indoors, and in real natural forest environments outdoors.

Link

Tuesday, February 11, 2014

Lab Meeting, February 13, 2014(Hung-Chih Lu): Zhaoyin Jiay, Andrew Gallaghery, Ashutosh Saxena "3D-Based Reasoning with Blocks, Support, and Stability." CVPR 2013

Title:

3D-Based Reasoning with Blocks, Support, and Stability

Author:

Zhaoyin Jiay, Andrew Gallaghery, Ashutosh Saxena.

Abstract:

3D volumetric reasoning is important for truly understanding a scene. Humans are able to both segment each
object in an image, and perceive a rich 3D interpretation of the scene, e.g., the space an object occupies, which objects support other objects, and which objects would, if moved, cause other objects to fall. We propose a new approach for parsing RGB-D images using 3D block units for volumetric reasoning. The algorithm fits image segments with 3D blocks, and iteratively evaluates the scene based on block interaction properties. We produce a 3D representation of the scene based on jointly optimizing over segmentations,
block fitting, supporting relations, and object stability. Our algorithm incorporates the intuition that a good 3D representation of the scene is the one that fits the data well, and is a stable, self-supporting (i.e., one that does not topple) arrangement of objects. We experiment on several datasets including controlled and real indoor scenarios. Results show that our stability-reasoning framework improves RGB-D segmentation and scene volumetric representation.

From
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2013

Link

Wednesday, January 22, 2014

Lab Meeting, January 23, 2014(Yun-Jun Shen):Delaitre, Vincent, et al. "Scene semantics from long-term observation of people." Computer Vision–ECCV 2012

Title:

Scene semantics from long-term observation of people

Author:
Delaitre, Vincent, David F. Fouhey, Ivan Laptev, Josef Sivic, Abhinav Gupta, and Alexei A. Efros.

Abstract:

Our everyday objects support various tasks and can be used by people for different purposes. While object classification is a widely studied topic in computer vision, recognition of object function, i.e., what people can do with an object and how they do it, is rarely addressed. In this paper we construct a functional object description with the aim to recognize objects by the way people interact with them. We describe scene objects (sofas, tables, chairs) by associated human poses and ob- ject appearance. Our model is learned discriminatively from automatically estimated body poses in many realistic scenes. In particular, we make use of time-lapse videos from YouTube providing a rich source of common human-object interactions and minimizing the effort of manual object annotation. We show how the models learned from human observations significantly improve object recognition and enable prediction of characteristic human poses in new scenes. Results are shown on a dataset of more than 400,000 frames obtained from 146 time-lapse videos of challenging and realistic indoor scenes.

From:

12th European Conference on Computer Vision

Link

Wednesday, January 15, 2014

Lab Meeting, January 16th, 2014 (Henry Lu): Jaeyong Sung, Colin Ponce, Bart Selman and Ashutosh Saxena. "Unstructured Human Activity Detection from RGBD Images" IEEE International Conference on Robotics and Automation (ICRA), 2012

Title:
Unstructured Human Activity Detection from RGBD Images
Authors:
Jaeyong Sung
Dept. of Comput. Sci., Cornell Univ., Ithaca, NY, USA Ponce, C. ; Selman, B. ; Saxena, A.
Abstracts:

Being able to detect and recognize human activities is essential for several applications, including personal assistive robotics. In this paper, we perform detection and recognition of unstructured humanactivity in unstructured environments. We use a RGBD sensor (Microsoft Kinect) as the input sensor, and compute a set of features based on human pose and motion, as well as based on image and point-cloud information. Our algorithm is based on a hierarchical maximum entropy Markov model (MEMM), which considers a person's activity as composed of a set of sub-activities. We infer the two-layered graph structure using a dynamic programming approach. We test our algorithm on detecting and recognizing twelve different activities performed by four people in different environments, such as a kitchen, a living room, an office, etc., and achieve good performance even when the person was not seen before in the training set.

From:

2012 IEEE International Conference on Robotics and Automation (ICRA)

Tuesday, January 07, 2014

Lab Meeting, January 9nd, 2014 (Zhi-qiang): Jiang Wang ; Zicheng Liu ; Ying Wu ; Junsong Yuan. "Mining actionlet ensemble for action recognition with depth cameras." IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2012

Title:
Mining actionlet ensemble for action recognition with depth cameras

Author:
Jiang Wang ; Zicheng Liu ; Ying Wu ; Junsong Yuan

Abstract:
Human action recognition is an important yet challenging task. The recently developed commodity depth sensors open up new possibilities of dealing with this problem but also present some unique challenges. The depth maps captured by the depth cameras are very noisy and the 3D positions of the tracked joints may be completely wrong if serious occlusions occur, which increases the intra-class variations in the actions. In this paper, an actionlet ensemble model is learnt to represent each action and to capture the intra-class variance. In addition, novel features that are suitable for depth data are proposed. They are robust to noise, invariant to translational and temporal misalignments, and capable of characterizing both the human motion and the human-object interactions. The proposed approach is evaluated on two challenging action recognition datasets captured by commodity depth cameras, and another dataset captured by a MoCap system. The experimental evaluations show that the proposed approach achieves superior performance to the state of the art algorithms.
From:
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2012
Link