Robot Perception and Learning: January 2014

Wednesday, January 22, 2014

Lab Meeting, January 23, 2014(Yun-Jun Shen):Delaitre, Vincent, et al. "Scene semantics from long-term observation of people." Computer Vision–ECCV 2012

Title:

Scene semantics from long-term observation of people

Author:
Delaitre, Vincent, David F. Fouhey, Ivan Laptev, Josef Sivic, Abhinav Gupta, and Alexei A. Efros.

Abstract:

Our everyday objects support various tasks and can be used by people for different purposes. While object classification is a widely studied topic in computer vision, recognition of object function, i.e., what people can do with an object and how they do it, is rarely addressed. In this paper we construct a functional object description with the aim to recognize objects by the way people interact with them. We describe scene objects (sofas, tables, chairs) by associated human poses and ob- ject appearance. Our model is learned discriminatively from automatically estimated body poses in many realistic scenes. In particular, we make use of time-lapse videos from YouTube providing a rich source of common human-object interactions and minimizing the effort of manual object annotation. We show how the models learned from human observations significantly improve object recognition and enable prediction of characteristic human poses in new scenes. Results are shown on a dataset of more than 400,000 frames obtained from 146 time-lapse videos of challenging and realistic indoor scenes.

From:

12th European Conference on Computer Vision

Link

Wednesday, January 15, 2014

Lab Meeting, January 16th, 2014 (Henry Lu): Jaeyong Sung, Colin Ponce, Bart Selman and Ashutosh Saxena. "Unstructured Human Activity Detection from RGBD Images" IEEE International Conference on Robotics and Automation (ICRA), 2012

Title:
Unstructured Human Activity Detection from RGBD Images
Authors:
Jaeyong Sung
Dept. of Comput. Sci., Cornell Univ., Ithaca, NY, USA Ponce, C. ; Selman, B. ; Saxena, A.
Abstracts:

Being able to detect and recognize human activities is essential for several applications, including personal assistive robotics. In this paper, we perform detection and recognition of unstructured humanactivity in unstructured environments. We use a RGBD sensor (Microsoft Kinect) as the input sensor, and compute a set of features based on human pose and motion, as well as based on image and point-cloud information. Our algorithm is based on a hierarchical maximum entropy Markov model (MEMM), which considers a person's activity as composed of a set of sub-activities. We infer the two-layered graph structure using a dynamic programming approach. We test our algorithm on detecting and recognizing twelve different activities performed by four people in different environments, such as a kitchen, a living room, an office, etc., and achieve good performance even when the person was not seen before in the training set.

From:

2012 IEEE International Conference on Robotics and Automation (ICRA)

Tuesday, January 07, 2014

Lab Meeting, January 9nd, 2014 (Zhi-qiang): Jiang Wang ; Zicheng Liu ; Ying Wu ; Junsong Yuan. "Mining actionlet ensemble for action recognition with depth cameras." IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2012

Title:
Mining actionlet ensemble for action recognition with depth cameras

Author:
Jiang Wang ; Zicheng Liu ; Ying Wu ; Junsong Yuan

Abstract:
Human action recognition is an important yet challenging task. The recently developed commodity depth sensors open up new possibilities of dealing with this problem but also present some unique challenges. The depth maps captured by the depth cameras are very noisy and the 3D positions of the tracked joints may be completely wrong if serious occlusions occur, which increases the intra-class variations in the actions. In this paper, an actionlet ensemble model is learnt to represent each action and to capture the intra-class variance. In addition, novel features that are suitable for depth data are proposed. They are robust to noise, invariant to translational and temporal misalignments, and capable of characterizing both the human motion and the human-object interactions. The proposed approach is evaluated on two challenging action recognition datasets captured by commodity depth cameras, and another dataset captured by a MoCap system. The experimental evaluations show that the proposed approach achieves superior performance to the state of the art algorithms.
From:
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2012
Link