Robot Perception and Learning

Monday, February 27, 2012

Lab Meeting Feb. 29 (Hank): Creating Household Environment Map for Environment Manipulation Using Color Range Sensors on Environment and Robot

Authors: Yohei Kakiuchi and Ryohei Ueda and Kei Okada and Masayuki Inaba

Abstract— A humanoid robot working in a household environment with people needs to localize and continuously update the locations of obstacles and manipulable objects. Achieving such system, requires strong perception method to efﬁciently update the frequently changing environment.

We propose a method for mapping a household environment using multiple stereo and depth cameras located on the humanoid head and the environment. The method relies on colored 3D point cloud data computed from the sensors. We achieve robot localization by matching the point clouds from the robot sensor data directly with the environment sensor data. Object detection is performed using Iterative Closest Point (ICP) with a database of known point cloud models. In order to guarantee accurate object detection results, objects are only detected within the robot sensor data. Furthermore, we utilize the environment sensor data to map out of the obstacles as bounding convex hulls.

We show experimental results creating a household environment map with known object labels and estimate the robot position in this map.

[link]

Thursday, February 16, 2012

Lab meeting Feb 22(Chih Chung): Motion planning in urban environments (Journal of Field Robotics 2008)

Author: Dave Ferguson, Thomas M. Howard and Maxim Likhachev

Abstract
We present the motion planning framework for an autonomous vehicle navigating through urban environments. Such environments present a number of motion planning challenges, including ultrareliability, high-speed operation, complex intervehicle interaction, parking in large unstructured lots, and constrained maneuvers. Our approach combines a model-predictive trajectory generation algorithm for computing dynamically feasible actions with two higher level planners for generating long-range plans in both on-road and unstructured areas of the environment. In the first part of this article, we describe the underlying trajectory generator and the on-road planning component of this system. We then describe the unstructured planning component of this system used for navigating through parking lots and recovering from anomalous on-road scenarios. Throughout, we provide examples and results from “Boss” an autonomous sport utility vehicle that has driven itself over 3,000 km and competed in, and won, the DARPA Urban Challenge.

[LINK]

Wednesday, December 28, 2011

Lab Meeting Dec. 29, 2011 (David): Semantic fusion of laser and vision in pedestrian detection (PR 2010)

Lab Meeting Dec. 29, 2011 (David): Semantic fusion of laser and vision in pedestrian detection (PR 2010)

Luciano Oliveira, Urbano Nunes, Paulo Peixoto, Marco Silva, Fernando Moita

Abstract
Fusion of laser and vision in object detection has been accomplished by two main approaches: (1) independent integration of sensor-driven features or sensor-driven classifiers, or (2) a region of interest (ROI) is found by laser segmentation and an image classifier is used to name the projected ROI. Here, we propose a novel fusion approach based on semantic information, and embodied on many levels. Sensor fusion is based on spatial relationship of parts-based classifiers, being performed via a Markov logic network. The proposed system deals with partial segments, it is able to recover depth information even if the laser fails, and the integration is modeled through contextual information—characteristics not found on previous approaches. Experiments in pedestrian detection demonstrate the effectiveness of our method over data sets gathered in urban scenarios.

Paper Link

Local Link

Wednesday, December 21, 2011

Lab Meeting Dec. 22, 2011 (Wang Li): Fast Point Feature Histograms (FPFH) for 3D Registration (ICRA 2009)

Fast Point Feature Histograms (FPFH) for 3D Registration

Radu Bogdan Rusu
Nico Blodow
Michael Beetz

Abstract

In this paper, we modify the mathematical expressions of Point Feature Histograms (PFH), and perform a rigorous analysis on their robustness and complexity for the problem of 3D registration. More concretely, we present optimizations that reduce the computation times drastically by either caching previously computed values or by revising their theoretical formulations. The latter results in a new type of local features, called Fast Point Feature Histograms (FPFH), which retain most of the discriminative power of the PFH. Moreover, we propose an algorithm for the online computation of FPFH features, demonstrate their efficiency for 3D registration, and propose a new sample consensus based method for bringing two datasets into the convergence basin of a local non-linear optimizer: SAC-IA (SAmple Consensus Initial Alignment).

Paper Link

Lab Meeting December 22nd, 2011 (Jeff): Towards Semantic SLAM using a Monocular Camera

Title: Towards Semantic SLAM using a Monocular Camera

Authors: Javier Civera, Dorian G´alvez-L´opez, L. Riazuelo, Juan D. Tard´os, and J. M. M. Montiel

Abstract:

Monocular SLAM systems have been mainly focused on producing geometric maps just composed of points or edges; but without any associated meaning or semantic content.
In this paper, we propose a semantic SLAM algorithm that merges in the estimated map traditional meaningless points with known objects. The non-annotated map is built using only the information extracted from a monocular image sequence. The known object models are automatically computed from a sparse set of images gathered by cameras that may be different from the SLAM camera. The models include both visual appearance and tridimensional information. The semantic or annotated part of the map –the objects– are estimated using the information in the image sequence and the precomputed object models.

The proposed algorithm runs an EKF monocular SLAM parallel to an object recognition thread. This latest one informs of the presence of an object in the sequence by searching
for SURF correspondences and checking afterwards their geometric compatibility. When an object is recognized it is inserted in the SLAM map, being its position measured and hence refined by the SLAM algorithm in subsequent frames. Experimental results show real-time performance for a handheld camera imaging a desktop environment and for a camera
mounted in a robot moving in a room-sized scenario.

Link:
IEEE International Conference on Intelligent Robots and Systems(IROS), 2011
LocalLink
http://webdiis.unizar.es/~jcivera/papers/civera_etal_iros11.pdf

Thursday, December 15, 2011

Lab Meeting Dec. 15, 2011 (Alan): Two-View Motion Segmentation with Model Selection and Outlier Removal by RANSAC-Enhanced Dirichlet ... (IJCV 2010)

Title: Two-View Motion Segmentation with Model Selection and Outlier Removal by RANSAC-Enhanced Dirichlet Process Mixture Models (IJCV 2010)

Authors: Yong-Dian Jian, Chu-Song Chen

Abstract:
We propose a novel motion segmentation algorithm based on mixture of Dirichlet process (MDP) models. In contrast to previous approaches, we consider motion segmentation and its model selection regarding to the number of motion models as an inseparable problem. Our algorithm can simultaneously infer the number of motion models, estimate the cluster memberships of correspondences, and identify the outliers. The main idea is to use MDP models to fully exploit the geometric consistencies before making premature decisions about the number of motion models. To handle outliers, we incorporate RANSAC into the inference process of MDP models. In the experiments, we compare the proposed algorithm with naive RANSAC, GPCA and Schindler’s method on both synthetic data and real image data. The experimental results show that we can handlemore motions and have satisfactory performance in the presence of various levels of noise and outlier.

Link

Monday, December 05, 2011

Lab Meeting Dec. 8, 2011 (Jim): Execution of a Dual-Object (Pushing) Action with Semantic Event Chains

Title: “Execution of a Dual-Object (Pushing) Action with Semantic Event Chains”
Authors: Aksoy Eren Erdal, Dellen Babette, Tamosiunaite Minija, and Wörgötter Florentin
In IEEE-RAS Int. Conf. on Humanoid Robots, pp.576-583

Abstract:
Here we present a framework for manipulation execution based on the so called “Semantic Event Chain” which is an abstract description of relations between the objects in the scene. It captures the change of those relations during a manipulation and thereby provides the decisive temporal anchor points by which a manipulation is critically defined. Using semantic event chains a model of a manipulation can be learned. We will show that it is possible to add the required control parameters (the spatial anchor points) to this model, which can then be executed by a robot in a fully autonomous way. The process of learning and execution of semantic event chains is explained using a box pushing example

Link

Thursday, November 24, 2011

Lab Meeting November 24, 2011 (Hank): A Large-Scale Hierarchical Multi-View RGB-D Object Dataset (ICRA 2011)

Authors: K. Lai, L. Bo, X. Ren, and D. Fox.
Title:A Large-Scale Hierarchical Multi-View RGB-D Object Dataset
In: Proc. of International Conference on Robotics and Automation (ICRA), 2011

Abstract:
Over the last decade, the availability of public image repositories and recognition benchmarks has enabled rapid progress in visual object category and instance detection. Today we are witnessing the birth of a new generation of sensing technologies capable of providing high quality synchronized videos of both color and depth, the RGB-D (Kinectstyle) camera. With its advanced sensing capabilities and the potential for mass adoption, this technology represents an opportunity to dramatically increase robotic object recognition, manipulation, navigation, and interaction capabilities. In this paper, we introduce a large-scale, hierarchical multi-view object dataset collected using an RGB-D camera. The dataset contains 300 objects organized into 51 categories and has been made publicly available to the research community so as to enable rapid progress based on this promising technology. This paper describes the dataset collection procedure and introduces techniques for RGB-D based object recognition and detection, demonstrating that combining color and depth information substantially improves quality of results.

link

Wednesday, November 23, 2011

Lab Meeting November 24, 2011 (Jimmy): Tracking Mobile Users in Wireless Networks via Semi-Supervised Co-Localization (TPAMI 2011)

Title: Tracking Mobile Users in Wireless Networks via Semi-Supervised Co-Localization
Authors: Jeffrey Junfeng Pan, Sinno Jialin Pan, Jie Yin, Lionel M. Ni, and Qiang Yang
In: TPAMI 2011

Abstract
Recent years have witnessed growing popularity of sensor and sensor-network technologies, supporting important practical applications. One of the fundamental issues is how to accurately locate a user with few labelled data in a wireless sensor network, where a major difﬁculty arises from the need to label large quantities of user location data, which in turn requires knowledge about the locations of signal transmitters, or access points. To solve this problem, we have developed a novel machine-learning-based approach that combines collaborative ﬁltering with graph-based semi-supervised learning to learn both mobile-users’ locations and the locations of access points. Our framework exploits both labelled and unlabelled data from mobile devices and access points. In our two-phase solution, we ﬁrst build a manifold-based model from a batch of labelled and unlabelled data in an ofﬂine training phase and then use a weighted k-nearest-neighbor method to localize a mobile client in an online localization phase. We extend the two-phase co-localization to an online and incremental model that can deal with labelled and unlabelled data that come sequentially and adapt to environmental changes. Finally, we embed an action model to the framework such that additional kinds of sensor signals can be utilized to further boost the performance of mobile tracking. Compared to other state-of-the-art systems, our framework has been shown to be more accurate while requiring less calibration effort in our experiments performed at three different test-beds.

[pdf]

Wednesday, November 16, 2011

Lab Meeting November 17, 2011 (Chih-Chung): Motion Planning under Uncertainty for Robotic Tasks with Long Time Horizons (IJRR 2011)

Authors: Hanna Kurniawati, Yanzhu Du, David Hsu and Wee Sun Lee.

Abstract:
Motion planning with imperfect state information is a crucial capability for autonomous robots to operate reliably in uncertain and dynamic environments. Partially observable Markov decision processes (POMDPs) provide a principled general framework for planning under uncertainty. Using probabilistic sampling, point-based POMDP solvers have drastically improved the speed of POMDP planning, enabling us to handle moderately complex robotic tasks. However, robot motion planning tasks with long time horizons remains a severe obstacle for even the fastest point-based POMDP solvers today. This paper proposes Milestone Guided Sampling (MiGS), a new point-based POMDP solver,which exploits state space information to reduce e ective planning horizons. MiGS samples a set of points, called milestones, from a robot's state space and constructs a simpli ed representation of the state space from the sampled milestones. It then uses this representation of the state space to guide
sampling in the belief space and tries to capture the essential features of the belief space with a small number of sampled points. Preliminary results are very promising. We tested MiGS in simulation on several di cult POMDPs that model distinct robotic tasks with long time horizons in both 2-D and 3-D environments. These POMDPs are impossible to solve with the fastest point-based solvers today, but MiGS solved them in a few minutes.

Link

Wednesday, November 02, 2011

Lab Meeting November 03, 2011 (David): Real-Time Multi-Person Tracking with Detector Assisted Structure Propagation (ICCV'11 Workshop)

Lab Meeting November 03, 2011 (David): Real-Time Multi-Person Tracking with Detector Assisted Structure Propagation (ICCV'11 Workshop)

Authors: Dennis Mitzel and Bastian Leibe

Abstract:
Classical tracking-by-detection approaches require a robust object detector that needs to be executed in each frame. However the detector is typically the most computationally expensive component, especially if more than one object class needs to be detected. In this paper we investigate how the usage of the object detector can be reduced by using stereo range data for following detected objects over time. To this end we propose a hybrid tracking framework consisting of a stereo based ICP (Iterative Closest Point) tracker and a high-level multi-hypothesis tracker. Initiated by a detector response, the ICP tracker follows individual pedestrians over time using just the raw depth information. Its output is then fed into the high-level tracker that is responsible for solving long-term data association and occlusion handling. In addition, we propose to constrain the detector to run only on some small regions of interest (ROIs) that are extracted from a 3D depth based occupancy map of the scene. The ROIs are tracked over time and only newly appearing ROIs are evaluated by the detector. We present experiments on real stereo sequences recorded from a moving camera setup in urban scenarios and show that our proposed approach achieves state of the art performance

Link

Wednesday, October 26, 2011

Lab Meeting October 27, 2011 (ShaoChen): A multiple hypothesis people tracker for teams of mobile robots (ICRA 2010)

Title: A multiple hypothesis people tracker for teams of mobile robots (ICRA 2010)

Authors: Tsokas, N.A. and Kyriakopoulos, K.J.

Abstract: This paper tackles the problem of tracking walking people with multiple moving robots equipped with laser rangefinders. We present an adaptation to the classic Multiple Hypothesis Tracking method, which allows for one-to-many associations between targets and measurements in each cycle and is thus capable of operating in a multi-sensor scenario. In the context of two experiments, the successful integration of our tracking algorithm to a dual-robot setup is assessed.

Link

Wednesday, October 12, 2011

Lab Meeting October 13, 2011 (Alan): A Model-Selection Framework for Multibody Structure-and-Motion of Image Sequences (IJCV 2008)

Title: A Model-Selection Framework for Multibody Structure-and-Motion of Image Sequences (IJCV 2008)

Authors: Konrad Schindler, David Suter and Hanzi Wang

Abstract: Given an image sequence of a scene consisting of multiple rigidly moving objects, multi-body structureand-motion (MSaM) is the task to segment the image feature tracks into the different rigid objects and compute the multiple-view geometry of each object.We present a framework for multibody structure-and-motion based on model selection. In a recover-and-select procedure, a redundant set of hypothetical scene motions is generated. Each subset of this pool of motion candidates is regarded as a possible explanation of the image feature tracks, and the most likely explanation is selected with model selection. The framework is
generic and can be used with any parametric camera model, or with a combination of different models. It can deal with sets of correspondences, which change over time, and it is robust to realistic amounts of outliers. The framework is demonstrated for different camera and scene models.

Link

Tuesday, October 11, 2011

Lab Meeting October 13th, 2011 (Jeff): Object Mapping, Recognition, and Localization from Tactile Geometry

Title: Object Mapping, Recognition, and Localization from Tactile Geometry

Authors: Zachary Pezzementi, Caitlin Reyda, and Gregory D. Hager

Abstract:

We present a method for performing object recognition using multiple images acquired from a tactile sensor. The method relies on using the tactile sensor as an imaging device, and builds an object representation based on mosaics of tactile measurements. We then describe an algorithm that is able to recognize an object using a small number of tactile sensor readings. Our approach makes extensive use of sequential state estimation techniques from the mobile robotics literature, whereby we view the object recognition problem as one of estimating a consistent location within a set of object maps. We examine and test approaches based on both traditional
particle filtering and histogram filtering. We demonstrate both the mapping and recognition / localization techniques on a set of raised letter shapes using real tactile sensor data.

Link:
IEEE International Conference on Robotics and Automation(ICRA), 2011
LocalLink
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5980363

Wednesday, September 28, 2011

Lab Meeting Sep. 29, 2011 (Wang Li): A Coarse-to-fine Approach for Fast Deformable Object Detection (CVPR 2011)

A Coarse-to-fine Approach for Fast Deformable Object Detection

Marco Pedersoli
Andrea Vedaldi
Jordi González

Abstract

We present a method that can dramatically accelerate object detection with part based models. The method is based on the observation that the cost of detection is likely to be dominated by the cost of matching each part to the image, and not by the cost of computing the optimal configuration of the parts as commonly assumed. Therefore accelerating detection requires minimizing the number of part-to-image comparisons. To this end we propose a multiple-resolutions hierarchical part based model and a corresponding coarse-to-fine inference procedure that recursively eliminates from the search space unpromising part placements. We evaluate our method extensively on the PASCAL VOC and INRIA datasets, demonstrating a very high increase in the detection speed with little degradation of the accuracy.

Paper Link

Wednesday, September 21, 2011

Lab Meeting September 22nd, 2011 (Jimmy): Vector Field SLAM

Title: Vector Field SLAM
Authors: Jens-Steffen Gutmann, Gabriel Brisson, Ethan Eade, Philip Fong and Mario Munich
In: ICRA 2010

Abstract
Localization in unknown environments using low-cost sensors remains a challenge. This paper presents a new localization approach that learns the spatial variation of an observed continuous signal. We model the signal as a piecewise linear function and estimate its parameters using a simultaneous localization and mapping (SLAM) approach. We apply our framework to a sensor measuring bearing to active beacons where measurements are systematically distorted due to occlusion and signal reﬂections of walls and other objects present in the environment. Experimental results from running GraphSLAM and EKF-SLAM on manually collected sensor measurements as well as on data recorded on a vacuum-cleaner robot validate our model.

[pdf]

Sunday, September 18, 2011

Lab Meeting September 22nd, 2011 (Jim): Learning the semantics of object–action relations by observation

Title: Learning the semantics of object–action relations by observation
Author: Eren Erdal Aksoy, Alexey Abramov, Johannes Dörr, Kejun Ning, Babette Dellen, and Florentin Wörgötter
The International Journal of Robotics Research 2011;30 1229-1249

Abstract:
Recognizing manipulations performed by a human and the transfer and execution of this by a robot is a difficult problem. We address this in the current study by introducing a novel representation of the relations between objects at decisive time points during a manipulation. Thereby, we encode the essential changes in a visual scenery in a condensed way such that a robot can recognize and learn a manipulation without prior object knowledge. To achieve this we continuously track image segments in the video and construct a dynamic graph sequence. Topological transitions of those graphs occur whenever a spatial relation between some segments has changed in a discontinuous way and these moments are stored in a transition matrix called the semantic event chain (SEC). We demonstrate that these time points are highly descriptive for distinguishing between different manipulations. Employing simple sub-string search algorithms, SECs can be compared and type-similar manipulations can be recognized with high confidence. As the approach is generic, statistical learning can be used to find the archetypal SEC of a given manipulation class. ...

http://ijr.sagepub.com/content/30/10/1229.full.pdf+html

Friday, September 09, 2011

Lab Meeting September 9th, 2011 (Steven): Learning Generic Invariances in Object Recognition: Translation and Scale

Title: Learning Generic Invariances in Object Recognition: Translation and Scale

Authors: Joel Z Leibo, Jim Mutch, Lorenzo Rosasco, Shimon Ullman4, and Tomaso Poggio

Abstract:

Invariance to various transformations is key to object recognition but existing definitions of invariance are somewhat confusing while discussions of invariance are often confused. In this report, we provide an operational definition of invariance by formally defining perceptual tasks as classification problems. The definition should be appropriate for physiology, psychophysics and computational modeling.

For any specific object, invariance can be trivially “learned” by memorizing a sufficient number of example images of the transformed object. While our formal definition of invariance also covers such cases, this report focuses instead on invariance from very few images and mostly on invariances from one example. Image-plane invariances – such as translation, rotation and scaling – can be computed from a single image for any object. They are called generic since in principle they can be hardwired or learned (during development) for any object.

In this perspective, we characterize the invariance range of a class of feedforward architectures for visual recognition that mimic the hierarchical organization of the ventral stream.

We show that this class of models achieves essentially perfect translation and scaling invariance for novel images. In this architecture a new image is represented in terms of weights of ”templates” (e.g. “centers” or “basis functions”) at each level in the hierarchy. Such a representation inherits the invariance of each template, which is implemented through replication of the corresponding “simple” units across positions or scales and their “association” in a “complex” unit. We show simulations on real images that characterize the type and number of templates needed to support the invariant recognition of novel objects. We find that 1) the templates need not be visually similar to the target objects and that 2) a very small number of them is sufficient for good recognition.
These somewhat surprising empirical results have intriguing implications for the learning of invariant recognition during the development of a biological organism, such as a human baby. In particular, we conjecture that invariance to translation and scale may be learned by the association – through temporal contiguity – of a small number of primal templates, that is patches extracted from the images of an object moving on the retina across positions and scales. The number of templates can later be augmented by bootstrapping mechanisms using the correspondence provided by the primal templates – without the need of temporal contiguity.

Link

Thursday, September 08, 2011

Lab Meeting September 9th, 2011 (Chih Chung): Identification and Representation of Homotopy (RSS 2011 Best paper)

title: Identification and Representation of Homotopy
Classes of Trajectories for Search-based Path
Planning in 3D

Authors: Subhrajit Bhattacharya, Maxim Likhachev and Vijay Kumar

Abstract: There are many applications in motion planning
where it is important to consider and distinguish between
different homotopy classes of trajectories. Two trajectories are
homotopic if one trajectory can be continuously deformed into
another without passing through an obstacle, and a homotopy
class is a collection of homotopic trajectories. In this paper
we consider the problem of robot exploration and planning in
three-dimensional configuration spaces to (a) identify and classify
different homotopy classes; and (b) plan trajectories constrained
to certain homotopy classes or avoiding specified homotopy
classes. In previous work [1] we have solved this problem for
two-dimensional, static environments using the Cauchy Integral
Theorem in concert with graph search techniques. The robot
workspace is mapped to the complex plane and obstacles are poles
in this plane. The Residue Theorem allows the use of integration
along the path to distinguish between trajectories in different
homotopy classes. However, this idea is fundamentally limited
to two dimensions. In this work we develop new techniques to
solve the same problem, but in three dimensions, using theorems
from electromagnetism. The Biot-Savart law lets us design an
appropriate vector field, the line integral of which, using the
integral form of Ampere’s Law, encodes information about
homotopy classes in three dimensions. Skeletons of obstacles
in the robot world are extracted and are modeled by currentcarrying
conductors. We describe the development of a practical
graph-search based planning tool with theoretical guarantees
by combining integration theory with search techniques, and
illustrate it with examples in three-dimensional spaces such as
two-dimensional, dynamic environments and three-dimensional
static environments.

link

Thursday, September 01, 2011

Lab Meeting September 2nd, 2011 (David): Multiclass Multimodal Detection and Tracking in Urban Environments

Title: Multiclass Multimodal Detection and Tracking in Urban Environments

Author: Luciano Spinello, Rudolph Triebel and Roland Siegwart

Abstract:
This paper presents a novel approach to detect and track people and cars based on the combined information retrieved from a camera and a laser range scanner. Laser data points are classified by using boosted Conditional Random Fields (CRF), while the image based detector uses an extension of the Implicit Shape Model (ISM), which learns a codebook of local descriptors from a set of hand-labeled images and uses them to vote for centers of detected objects. Our extensions to ISM include the learning of object parts and template masks to obtain more distinctive votes for the particular object classes. The detections from both sensors are then fused and the objects are tracked using a Kalman Filter with multiple motion models. Experiments conducted in real-world urban scenarios demonstrate the effectiveness of our approach.

Link:
IJRR copy
localcopy