Robot Perception and Learning: July 2007

Sunday, July 29, 2007

Lab Meeting 30 July (Any): Map-Based Precision Vehicle Localization in Urban Environments

Jesse Levinson, Michael Montemerlo, and Sebastian Thrun

Robotics: Science and Systems III

Abstract:
Many urban navigation applications (e.g., autonomous navigation, driver assistance systems) can benefit greatly from localization with centimeter accuracy. Yet such accuracy cannot be achieved reliably with GPS-based inertial guidance systems, specifically in urban settings.
We propose a technique for high-accuracy localization of moving vehicles that utilizes maps of urban environments. Our approach integrates GPS, IMU, wheel odometry, and LIDAR data acquired by an instrumented vehicle, to generate high-resolution environment maps. Offline relaxation techniques similar to recent SLAM methods are employed to bring the map into alignment at intersections and other regions of self-overlap. By reducing the final map to the flat road surface, imprints of other vehicles are removed. The result is a 2-D surface image of ground reflectivity in the infrared spectrum with 5cm pixel resolution.
To localize a moving vehicle relative to these maps, we present a particle filter method for correlating LIDAR measurements with this map. As we show by experimentation, the resulting relative accuracies exceed that of conventional GPS-IMU-odometry-based methods by more than an order of magnitude. Specifically, we show that our algorithm is effective in urban environments, achieving reliable real-time localization with accuracy in the 10-centimeter range. Experimental results are provided for localization in GPS-denied environments, during bad weather, and in dense traffic.

Paper (PDF): Link

Lab Meeting July 30th, 2007(Yi-liu Chao):Improving Location Accuracy by Combining Visual Tracking and Radio Positioning Techniques

My thesis.

Abstract: Radio positioning system and visual positioning system are complementary in several aspects. Radio positioning is capable of locating people in a large area, while visual positioning only monitors sub-areas covered by camera views. However, radio positioning could be completely unstable with people cluster while visual positioning can obtain some definite observations within. As aresult, taking advantage of the two systems leads to a preferable localization system. In this paper, we propose a framework that increases the accuracy and stability of the indoor radio positioning system by combining it with a supporting visual tracking system. An association method considering both trajectories and appearances is proposed to integrate the visual targets and the radio clients. Once the observations are associated, the location estimates of the radio clients are modified with those of the associated visual targets considered .After that, the results of the other radio clients are refined by taking the modifications into account. In this way, we can enhance the localization performance of the entire radio positioning system. The proposed framework can be applied to a variety of existing radio positioning techniques. The experimental results show its performance in preserving accuracy with people clusters in stationary and moving situations.

The New York Times: The Real Transformers

By ROBIN MARANTZ HENIG
Published: July 29, 2007
Researchers are programming robots to learn in humanlike ways and show humanlike traits. Could this be the beginning of robot consciousness — and of a better understanding of ourselves?

The link.

A good article summarizing robotics research at MIT. You guys should check this article out. -Bob

Friday, July 27, 2007

Patent Application: Variable liquid mirrors

We may use this smart mirror for omni-directional camera design. -Bob

From
* 12:14 23 July 2007
* NewScientist.com news service
* Justin Mullins

Liquid lenses are set to revolutionise the design of small optical devices such as camera phones. They have no mechanical parts, consisting of just a fluid held inside a chamber in such a way that the fluid's surface forms a lens shape. Applying an electric field to the fluid changes the shape of its surface, thereby altering the focal length of the lens.

Now, consumer electronics company Philips plans to use the same principle to create variable mirrors.

The layer that forms between certain types of fluid can be reflective. By placing these liquids in a chamber and applying an electric field, Philips says it can vary the shape of the layer, and so the shape of the mirror, in just the same way as a liquid lens.

See the full article.
See the full variable liquid mirror patent application.

Monday, July 23, 2007

Lab Meeting July 24th, 2007(Kuo-Hwei Lin):Crowd detection in video sequences

Author: Pini Reisman, Ofer Mano, Shai Avidan, and Amnon Shashua

From: IEEE Intelligent Vehicles Symposium (IV2004), June 2004, Parma, Italy.

Abstract:
We present a real-time system that detects moving
crowd in a video sequence. Crowd detection differs from
pedestrian detection in that we assume that no individual
pedestrian can be properly segmented in the image. We propose
a scheme that looks at the motion patterns of crowd in the
spatio-temporal domain and give an efficient implementation
that can detect crowd in real-time. In our experiments we
detected crowd at distances of up to 70m.

Lab Meeting July 24th, 2007 (Stanley): A Generalized Framework for Solving Tightly-coupled Multirobot Planning Problems

Author: Nidhi Kalra, Dave Ferguson, and Anthony Stentz

From: P.3359-3364, 2007 IEEE International Conference on Robotics and Automation

Abstract: In this paper, we present the generalized version of the Hoplites coordination framework designed to efficiently solve complex, tightly-coupled multirobot planning problems. Our extensions greatly increase the flexibility with which teammates can both plan and coordinate with each other; consequently, we can apply Hoplites to a wider range of domains and plan coordination between robots more efficiently.We apply our framework to the constrained exploration domain and compare
Hoplites in simulation to competing distributed and centralized approaches. Our results demonstrate that Hoplites significantly outperforms both approaches in terms of the quality of solutions produced while remaining computationally competitive with
much simpler approaches. We further demonstrate features such as scalability and validate our approach with field results from a team of large autonomous vehicles performing constrained exploration in an outdoor environment.

Monday, July 16, 2007

CVPR2007 : Using Stereo Matching for 2-D Face Recognition Across Pose

Title : Using Stereo Matching for 2-D Face Recognition Across Pose

Author :
Carlos D. Castillo @ University of Maryland
David W. Jacobs @ University of Maryland

Abstract :

We propose using stereo matching for 2-D face recognition across pose. We match one 2-D query image to one 2-D gallery image without performing 3-D reconstruction. Then the cost of this matching is used to evaluate the similarity of the two images. We show that this cost is robust to pose variations. To illustrate this idea we built a face recognition system on top of a dynamic programming stereo matching algorithm. The method works well even when the epipolar lines we use do not exactly fit the viewpoints. We have tested our approach on the PIE dataset. In all the experiments, our method demonstrates effective performance compared with other algorithms.

[CVPR2007] Monocular and Stereo Methods for AAM Learning from Video

Title : Monocular and Stereo Methods for AAM Learning from Video

Author :

Jason Saragih @ Research School of Information Sciences and Engineering, Australian National University
Roland Goecke @ National ICT Australia, Canberra Research Laboratory Canberra, Australia

Abstract :

The active appearance model (AAM) is a powerful method for modeling deformable visual objects. One of the major drawbacks of the AAM is that it requires a training set of pseudo-dense correspondences over the whole database. In this work, we investigate the utility of stereo constraints for automatic model building from video. First, we propose a new method for automatic correspondence finding in monocular images which is based on an adaptive template tracking paradigm. We then extend this method to take the scene geometry into account, proposing three approaches, each accounting for the availability of the fundamental matrix and calibration parameters or the lack thereof. The performance of the monocular method was first evaluated on a pre-annotated database of a talking face. We then compared the monocularmethod against its three stereo extensions using a stereo database.

Sunday, July 15, 2007

CVPR07 : A Nine-point Algorithm for Estimating Para-Catadioptric Fundamental Matrices

Author: Christopher Geyer and Henrik Stewenius

Abstract:

We present a minimal-point algorithm for finding fundamental matrices for catadioptric cameras of the parabolic type. Central catadioptric cameras—an optical combination of a mirror and a lens that yields an imaging device equivalent within hemispheres to perspective cameras–have found wide application in robotics, tele-immersion and providing enhanced situational awareness for remote operation. We use an uncalibrated structure-from-motion framework developed for these cameras to consider the problem of estimating the fundamental matrix for such cameras. We present a solution that can compute the para-catadioptirc fundamental matrix with nine point correspondences, the smallest number possible. We compare this algorithm to alternatives and show some results of using the algorithm in conjunction with random sample consensus (RANSAC).

CVPR07 : Toward Flexible 3D Modeling using a Catadioptric Camera

Author: Maxime Lhuillier

Abstract:

Fully automatic 3D modeling from a catadioptric image sequence has rarely been addressed until now, although this is a long-standing problem for perspective images. All previous catadioptric approaches have been limited to dense reconstruction for a few view points, and the majority of them require calibration of the camera. This paper presents a method which deals with hundreds of images, and does not require precise calibration knowledge. In this context, the same 3D point of the scene may be visible and reconstructed in a large number of images at very different accuracies. So the main part of this paper concerns the selection of reconstructed points, a problem largely ignored in previous works. Summaries of the structure from motion and dense stereo steps are also given. Experiments include the 3D model reconstruction of indoor and outdoor scenes, and a walkthrough in a city.

[Link]

CVPR07 Viewpoint-Coded Structured Light

Author : Mark Young, Erik Beeson, James Davis, Szymon Rusinkiewicz, and Ravi Ramamoorthi

Abstract :
We introduce a theoretical framework and practical algorithmsfor replacing time-coded structured light patternswith viewpoint codes, in the form of additional camera locations. Current structured light methods typically use log(N) light patterns, encoded over time, to unambiguously reconstructN unique depths. We demonstrate that each additionalcamera location may replace one frame in a temporalbinary code. Our theoretical viewpoint coding analysisshows that, by using a high frequency stripe pattern andplacing cameras in carefully selected locations, the epipolarprojection in each camera can be made to mimic thebinary encoding patterns normally projected over time. Resultsfrom our practical implementation demonstrate reliabledepth reconstruction that makes neither temporal norspatial continuity assumptions about the scene being captured.

[Link]

Lab Meeting July 16th, 2007 (Atwood): Matching Local Self-Similarities across Images and Videos

Title: Matching Local Self-Similarities across Images and Videos

Author: Eli Shechtman Michal Irani

Abstract:
We present an approach for measuring similarity between
visual entities (images or videos) based on matching
internal self-similarities. What is correlated across
images (or across video sequences) is the internal layout
of local self-similarities (up to some distortions), even
though the patterns generating those local self-similarities
are quite different in each of the images/videos. These internal
self-similarities are efficiently captured by a compact
local “self-similarity descriptor”, measured densely
throughout the image/video, at multiple scales, while accounting
for local and global geometric distortions. This
gives rise to matching capabilities of complex visual data,
including detection of objects in real cluttered images using
only rough hand-sketches, handling textured objects with
no clear boundaries, and detecting complex actions in cluttered
video data with no prior learning. We compare our
measure to commonly used image-based and video-based
similarity measures, and demonstrate its applicability to object
detection, retrieval, and action detection.

fulltext

CVPR07 Multi-class object tracking algorithm that handles fragmentation and grouping

Biswajit Bose, Xiaogang Wang and Eric Grimson,
"Multi-class object tracking algorithm that handles fragmentation and grouping," to appear in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Minneapolis, USA, June 2007.

Abstract:

We propose a framework for detecting and tracking multiple interacting objects, while explicitly handling the dual problems of fragmentation (an object may be broken into several blobs) and grouping (multiple objects may appear as a single blob). We use foreground blobs obtained by background subtraction from a stationary camera as measurements. The main challenge is to associate blob measurements with objects, given the fragment-object-group ambiguity when the number of objects is variable and unknown, and object-class-specific models are not available. We first track foreground blobs till they merge or split. We then build an inference graph representing merge-split relations between the tracked blobs. Using this graph and a generic object model based on spatial connectedness and coherent motion, we label the tracked blobs as whole objects, fragments of objects or groups of interacting objects. The outputs of our algorithm are entire tracks of objects, which may include corresponding tracks from groups during interactions. Experimental results on multiple video sequences are shown.

Saturday, July 14, 2007

CVPR07 poster: Wide-Area Egomotion Estimation from Known 3D Structure

Olivier Koch and Seth Teller

Robust egomotion recovery for extended camera excursions has long been a challenge for machine vision researchers. Existing algorithms handle spatially limited environments and tend to consume prohibitive computational resources with increasing excursion time and distance.

We describe an egomotion estimation algorithm that takes as input a coarse 3D model of an environment, and an omnidirectional video sequence captured within the environment, and produces as output a reconstruction of the camera’s 6-DOF egomotion expressed in the coordinates of the input model. The principal novelty of our method is a robust matching algorithm that associates 2D edges from the video with 3D line segments from the input model.

Our system handles 3-DOF and 6-DOF camera excursions of hundreds of meters within real, cluttered environments. It uses a novel prior visibility analysis to speed initialization and dramatically accelerate image-to-model matching. We demonstrate the method’s operation, and qualitatively and quantitatively evaluate its performance, on both synthetic and real image sequences.

[Paper] [Poster]

Friday, July 13, 2007

Lab Meeting July 16th, 2007 (Jeff):Context and Feature Sensitive Re-sampling from Discrete Surface Measurements

Title: Context and Feature Sensitive Re-sampling from Discrete Surface Measurements

Authors: David M Cole and Paul M Newman

Abstract:

This paper concerns context and feature-sensitive re-sampling of workspace surfaces represented by 3D point clouds. We interpret a point cloud as the outcome of repetitive and non-uniform sampling of the surfaces in the workspace. The nature of this sampling may not be ideal for all applications, representations and downstream processing. For example it might be preferable to have a high point density around sharp edges or near marked changes in texture. Additionally such preferences might be dependent on the semantic classification of the surface in question. This paper addresses this issue and provides aframework which given a raw point cloud as input, produces a new point cloud by re-sampling from the underlying workspace surfaces. Moreover it does this in a manner which can be biased by local low-level geometric or appearance properties and higher level (semantic) classification of the surface. We are in no way prescriptive about what justifies a biasing in the re-sampling scheme — this is left up to the user who may encapsulate what constitutes “interesting” into one or more “policies” which are used to modulate the default re-sampling behavior.

Link:
RSS 2007 Poster Paper
http://www.roboticsproceedings.org/rss03/p13.pdf

Thursday, July 12, 2007

[CMU Thesis Proposal] Recognizing Object Structures - A Bayesian Framework for Deformable

Title: Recognizing Object Structures - A Bayesian Framework for Deformable Matching

Author : Leon Gu

Abstract:

When we look at images of human faces, cars and peoples, we do not perceive them as collections of pixels, we perceive the structures. Image understanding requires uncovering details of object structures. One of the most promising ways for this task is through deformable template matching. However, traditional deformable models have been largely limited to images with sharp contrast, clean background or restricted testing samples. One of the reasons for this is the lack of a principled statistical framework for deformable matching under general imaging conditions.

In this thesis, we propose a Bayesian framework which describes structure deformation, geometrical transformation and image evidence in a three-layered generative model. Deformable matching is viewed as a Bayesian inference procedure. In particular, we will show how to solve a few typical matching problems in this framework, for instance, how to control shape smoothness when images are noisy, how to identify outliers on background clutters or occluded regions, how to make use of multi-modal shape priors, and how to infer missing geometrical informations such as depth from single image. One appealing point of the proposed work is that all these problems are solved in a consistent way. We demonstrate the applications of this theory in recovering 2D/3D facial structures and human body configurations from still images.

Tuesday, July 10, 2007

Thesis talk: Seeing the World Behind the Image

Date: 10 July 2007
Time: 1:00 p.m.
Place: Newell Simon Hall 1305
Type: Thesis Oral
Who: Derek Hoiem
Topic: Seeing the World Behind the Image: Spatial Layout for 3D Scene
Understanding

Abstract:
When humans look at an image, they see not just a pattern of color and
texture, but the world behind the image. In the same way, computer
vision algorithms must go beyond the pixels and reason about the
underlying scene. In this dissertation, we propose methods to recover
the basic spatial layout from a single image and begin to investigate
its use as a foundation for scene understanding.

Our spatial layout is a description of the 3D scene in terms of
surfaces, occlusions, camera viewpoint, and objects. We propose a
geometric class representation, a coarse categorization of surfaces
according to their 3D orientations, and learn appearance-based models of
geometry to identify surfaces in an image. These surface estimates serve
as a basis for recovering the boundaries and occlusion relationships of
prominent objects. We further show that simple reasoning about camera
viewpoint and object size in the image allows accurate inference of the
viewpoint and greatly improves object detection. Finally, we demonstrate
the potential usefulness of our methods in applications to 3D
reconstruction, scene synthesis, and robot navigation.

Thesis Committee Members:
Alexei A. Efros, Co-Chair
Martial Hebert, Co-Chair
Takeo Kanade
Rahul Sukthankar, Intel Research Pittsburgh
William T. Freeman, Massachusetts Institute of Technology

A draft of the thesis document is available at:
link

Monday, July 09, 2007

Lab Meeting 9 July (Jim): Active Policy Learning for Robot Planning and Exploration under Uncertainty

Active Policy Learning for Robot Planning and Exploration under Uncertainty

This paper process a simulation-based active policy learning algorithm for finite-horizon, partially-observed sequential decision processes.

link

Lab Meeting 9 July (Leo): Simultaneous Localisation and Mapping in Dynamic Environments (SLAMIDE) with Reversible Data Association

Simultaneous Localisation and Mapping in Dynamic Environments (SLAMIDE) with Reversible Data Association

authors:
Charles Bibby, Ian Reid

from:
RSS 07

Abstract:
The conventional technique for dealing with dynamic
objects in SLAM is to detect them and then either treat
them as outliers [20][1] or track them separately using traditional
multi-target tracking [18]. We propose a technique that combines
the least-squares formulation of SLAM and sliding window
optimisation together with generalised expectation maximisation,
to incorporate both dynamic and stationary objects directly into
SLAM estimation. The sliding window allows us to postpone the
commitment of model selection and data association decisions
by delaying when they are marginalised permanently into the
estimate. The two main contributions of this paper are thus: (i)
using reversible model selection to include dynamic objects into
SLAM and (ii) incorporating reversible data association.We show
empirically that (i) if dynamic objects are present our method
can include them in a single framework and hence maintain
a consistent estimate and (ii) our estimator remains consistent
when data association is difficult, for instance in the presence of
clutter. We summarise the results of detailed and extensive tests
of our method against various benchmark algorithms, showing
its effectiveness.

Sunday, July 08, 2007

HRI07: A Dancing Robot for Rhythmic Social Interaction

This paper describes a robotic system that uses dance as a form of social interaction to explore the properties and importance of rhythmic movement in general social interaction. The system consists of a small creature-like robot whose movement is controlled by a rhythm-based software system. Environmental rhythms can be extracted from auditory or visual sensory stimuli, and the robot synchronizes its movement to a dominant rhythm. The system was demonstrated, and an exploratory study conducted, with children interacting with the robot in a generalized dance task. Through a behavioral analysis of videotaped interactions, we found that the robot’s synchronization with the background music had an effect on children’s interactive involvement with the robot. Furthermore, we observed a number of expected and unexpected styles and modalities of interactive exploration and play that inform our discussion on the next steps in the design of a socially rhythmic robotic system.
Paper
Project link

Monday, July 02, 2007

A survey of approaches and challenges in 3D and multi-modal 3D + 2D face recognition

Title : A survey of approaches and challenges in 3D and multi-modal 3D + 2D face recognition

Author : Kevin W. Bowyer, Kyong Chang and Patrick Flynn

Abstract :

This survey focuses on recognition performed by matching models of the three-dimensional shape of the face, either alone or in combination with matching corresponding two-dimensional intensity images. Research trends to date are summarized, and challenges confronting the development of more accurate three-dimensional face recognition are identified. These challenges include the need for better sensors, improved recognition algorithms, and more rigorous experimental methodology.

The full text can be found here.

LiveScience: Helpful Robot Alters Family Life

Jodi Forlizzi, assistant professor of human-computer interaction and design, Carnegie Mellon University, studies how the use of Roomba, a robotic vacuum cleaner, affects the lifestyles, relationships and attitudes of families compared with use of ordinary stick vacuums.

“The surprising thing to me was how much the Roomba changed the way that people cleaned,” said Forlizzi. The robot changed who did the cleaning, styles of cleaning and even how people kept their homes. In addition to naming their Roombas, some admitted they talked to the robot as it worked.

Unlike the Roomba, however, the stick vac didn’t change anyone’s routine. Given that both had similar cleaning capabilities, the Roomba’s autonomous, semi-intelligent features likely accounted for its greater impact, Forlizzi said. “The key is to let people talk. In that way, you see what people value.”

The full article