Tuesday, December 30, 2008

Robot PAL PhD Thesis Proposal: Towards Robust Localization in Highly Dynamic Environments

Shao-Wen Yang
Proposal for Doctoral Thesis

Thesis Committee:
Chieh-Chih Wang (Chair)
Li-Chen Fu
Jane Yung-Jen Hsu
Han-Pang Huang
Ta-Te Lin
John J. Leonard, MIT

Date: January 12 2009
Time: 1:00pm
Place: R524

Abstract--Localization in urban environments is a key prerequisite for making a robot truly autonomous, as well as an important issue in collective and cooperative robotics. It is not easily achievable when moving objects are involved or environment changes. Ego-motion estimation is the problem of determining the pose of a robot relative to its previous location without an absolute frame of reference. Mobile robot localization is the problem of determining the pose of a robot relative to a given map of the environment. The performance of ego-motion estimation completely depends on the consistency between sensor information at successive time steps, whereas the performance of global localization highly depends on the consistency between the sensor information and the a priori environment knowledge. The inconsistencies make a robot unable to robustly localize itself in real environments. Explicitly taking into account the inconsistencies serves as the basis for mobile robot localization.

In this thesis, we explore the problem of mobile robot localization in highly dynamic environments. We proposed a multiple-model approach to solve the problems of ego-motion estimation and moving object detection jointly in a random sample consensus (RANSAC) paradigm. We show that accurate identification of static environments can help classification of moving objects, whereas discrimination of moving objects also yields better ego-motion estimation, particularly in environments containing a significant percentage of moving objects.

It is believed that a solution to the moving object detection problem can provide a bridge between the simultaneous localization and mapping (SLAM) and the detection and tracking of moving objects (DATMO) problems. Based on the ego-motion estimation framework, to provide reliable moving object detection, data association can still be problematic due to merge and split of objects and temporal occlusion. We propose the use of discriminative models to reason about the joint association between measurements. Scaling such a system to solve the global localization problem will increase the reliability for mobile robots to perform autonomous tasks in crowded urban scenes. We propose to use a multiple-model approach based on the probabilistic mobile robot localization framework and formulate an extension to the global localization problem. Besides, detecting objects of small sizes at low speeds, such as pedestrians, is difficult, but of particular interest in mobile robotics. We propose the use of prior knowledge from the mobile robot localization framework to deal with the problem of pedestrian detection, and formalize the localization-by-detection and detection-by-localization framework. The proposed approach will be demonstrated using experimental testing with real data.

Full text: PDF

Monday, December 29, 2008

Lab Meeting January 5, 2009 (Shao-Chen):Blended Local Planning for Generating Safe and Feasible Paths

Title:Blended Local Planning for Generating Safe and Feasible Paths(IROS2008)
Authors:Ling Xu, Anthony Stentz

Abstract—Many planning approaches adhere to the twotiered architecture consisting of a long-range, low fidelity global planner and a short-range high fidelity local planner. While this architecture works well in general, it fails in highly constrained environments where the available paths are limited. These situations amplify mismatches between the global and local plans due to the smaller set of feasible actions. We present an approach that dynamically blends local plans online to match the field of global paths. Our blended local planner generates paths from control commands to ensure the safety of the robot as well as achieve the goal. Blending also results in more complete plans than an equivalent unblended planner when navigating cluttered environments. These properties enable the blended local planner to utilize a smaller control set while achieving more efficient planning time. We demonstrate the advantages of blending in simulation using a kinematic car model navigating through maps containing tunnels, cul-de-sacs, and random obstacles.


Tuesday, December 23, 2008

Lab Meeting December 29, 2008 (fish60) DWA and/or GND

I will try to report want I have read recently.

Dynamic window based approach to mobile robot motion control in the presence of moving obstacles
This paper presents a motion control method for mobile robots in partially unknown environments populated with moving obstacles. The proposed method is based on the in-tegration of focused D* search algorithm and dynamic window local obstacle avoidance algorithm with some adaptations that provide efficient avoidance of moving obstacles.

Proceedings of IEEE International Conference on Robotics and Automation - ICRA 2007, Roma, Italy, 10-14 April 2007, pp. 1986-1991, 2007.


Global Nearness Diagram Navigation (GND)

The GND generates motion commands to drive a robot safely between locations, whilst avoiding collisions. This system has all the advantages of using the reactive scheme nearness diagram (ND), while having the ability to reason and plan globally (reaching global convergence to the navigation problem).

In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), 2001. Seoul, Korea.


Lab Meeting December 29, 2008 (Alan): Toward a Unified Bayesian Approach to Hybrid Metric--Topological SLAM (IEEE Transactions on Robotics)

Title: Toward a Unified Bayesian Approach to Hybrid Metric--Topological SLAM (IEEE Transactions on Robotics)
Authors: Blanco, J.-L.; Fernandez-Madrigal, J.-A.; Gonzalez, J.

Abstract—This paper introduces a new approach to simultaneous localization and mapping (SLAM) that pursues robustness and accuracy in large-scale environments. Like most successful works on SLAM, we use Bayesian filtering to provide a probabilistic estimation that can cope with uncertainty in the measurements, the robot pose, and the map. Our approach is based on the reconstruction of the robot path in a hybrid discrete-continuous state space, which naturally combines metric and topological maps. There are two fundamental characteristics that set this paper apart from previous ones: 1) the use of a unified Bayesian inference approach both for the metrical and the topological parts of the problem and 2) the analytical formulation of belief distributions over hybridmaps, which allows us to maintain the spatial uncertainty in large spaces more accurately and efficiently than in previous works. We also describe a practical implementation that aims for real-time operation. Our ideas have been validated by promising experimental results in large environments (up to 30 000 m2, a 2 km robot path) with multiple nested loops, which could hardly be managed appropriately by other approaches.

[Local copy]

Monday, December 22, 2008

Lab Meeting December 22th, 2008 (slyfox):σSLAM:Stereo Vision SLAM Using the Rao-Blackwellised Particle Filter and Novel Mixture Proposal Distribution

Title: σSLAM: Stereo Vision SLAM Using the Rao-Blackwellised Particle Filter and a Novel Mixture Proposal Distribution

Author: Pantelis Elinas, Robert Sim, James J. Little

We consider the problem of Simultaneous Localization and Mapping (SLAM) using the Rao-Blackwellised Particle Filter (RBPF) for the class of indoor mobile robots equipped only with stereo vision. Our goal is to construct dense metric maps of natural 3D point landmarks for large cyclic environments in the absence of accurate landmark position measurements and motion estimates. Our work differs from other approaches because landmark estimates are derived from stereo vision and motion estimates are based on sparse optical ow. We distinguish between landmarks using the Scale Invariant Feature Transform (SIFT). This is in contrast to current popular approaches that rely on reliable motion models derived from odometric hardware and
accurate landmark measurements obtained with laser sensors. Since our approach depends on a particle filter whose main component is the proposal distribution, we develop and evaluate
a novel mixture proposal distribution that allows us to robustly close large loops. We validate our approach experimentally for long camera trajectories processing thousands of images at
reasonable frame rates.


Tuesday, December 16, 2008

CMU talk: Enhancing Photographs using Content-Specific Image Priors

VASC Seminar
December 15, 2008

Enhancing Photographs using Content-Specific Image Priors
Neel Joshi
Microsoft Research

The digital imaging revolution has made the camera practically ubiquitous; however, image quality has not improved with increased camera availability, and image artifacts such as blur, noise, and poor color-balance are still quite prevalent. As a result, there is a strong need for simple, automatic, and accurate methods for image correction. Correcting these artifacts, however, is challenging, as problems such as deblurring, denoising, and color-correction are ill-posed, where the number of unknown values outweighs the number of observations. As a result, it is necessary to add additional prior information as constraints.

In this talk, I will present three aspects of my dissertation on performing image enhancement using content-specific image models and priors, i.e. models tuned to a particular image. First, I will discuss my work in methods that learn from a photographer's image collection, where I use identity-specific priors to perform corrections for images containing faces. These methods introduce an intuitive paradigm for image enhancement, where users fix images by simply providing examples of good photos from their personal photo album. Second, I will discuss a fast blur estimation method which uses a model that all edges in a sharp image are step-edges. Lastly, I will discuss a framework for image deblurring and denoising that uses local color statistics to produce sharp, low-noise results.

Neel Joshi is a Postdoctoral Researcher at Microsoft Research. He recently completed his Ph.D. in Computer Science at UC San Diego where he was advised by Dr. David Kriegman. His research interests include computer vision and graphics, specifically computational photography and video, data-driven graphics, and appearance measurement and modeling. Previously, he earned his Sc.B. in Computer Science from Brown University and his M.S. in Computer Science from Stanford University. He has also held internships at Mitsubishi Electric Research Labs (MERL), Adobe Systems, and Microsoft Research.

Monday, December 15, 2008

Lab Meeting December 22th, 2008 (swem): Real-time 3D Object Pose Estimation and Tracking for Natural Landmark Based Visual Servo

Title: Real-time 3D Object Pose Estimation and Tracking for Natural Landmark Based Visual Servo
Author: Changhyun Choi, Seung-Min Baek and Sukhan Lee, Fellow Member, IEEE


A real-time solution for estimating and tracking the 3D pose of a rigid object is presented for image-based visual servo with natural landmarks. The many state-of-the-art technologies that are available for recognizing the 3D pose of an object in a natural setting are not suitable for real-time servo due to their time lags. This paper demonstrates that a real-time solution of 3D pose estimation become feasible by combining a fast tracker such as KLT [7] [8] with a method of determining the 3D coordinates of tracking points on an object at the time of SIFT based tracking point initiation, assuming that a 3D geometric model with SIFT description of an object is known a-priori. Keeping track of tracking points with KLT, removing the tracking point outliers automatically, and reinitiating the tracking points using SIFT once deteriorated, the 3D pose of an object can be estimated and tracked in real-time. This method can be applied to both mono and stereo camera based 3D pose estimation and tracking. The former guarantees higher frame rates with about 1 ms of local pose estimation, while the latter assures of more precise pose results but with about 16 ms of local pose estimation. The experimental investigations have shown the effectiveness of the proposed approach with real-time performance.


Monday, December 08, 2008

CMU talk: Differentially Constrained Motion Re-Planning

CMU FRC Seminar

Differentially Constrained Motion Re-Planning

Mihail Pivtoraiko
Graduate Student, Robotics Institute, CMU

Thursday, December 11th

This talk presents an approach to differentially constrained robot motion planning and efficient re-planning. Satisfaction of differential constraints is guaranteed by the state lattice, a search space which consists of feasible motions. Any systematic re-planning algorithm, e.g. D*, can be utilized to search the state lattice to find a motion plan that satisfies the differential constraints, and to repair it efficiently in the event of a change in the environment. Further efficiency is obtained by varying the fidelity of representation of the planning problem. High fidelity is utilized where it matters most, while it is lowered in the areas that do not affect the quality of the plan significantly. The talk presents a method to modify the fidelity between re-plans, thereby enabling dynamic flexibility of the search space, while maintaining its compatibility with re-planning algorithms. The approach is especially suited for mobile robotics applications in unknown challenging environments. We successfully applied the motion planner to robot navigation in this setting.

Speaker Bio: Mihail Pivtoraiko, is a graduate student at the Robotics Institute. He received his Master's degree at the Robotics Institute in 2005 and worked in the Robotics Section at the NASA/Caltech Jet Propulsion Laboratory (JPL) before returning to RI. Mihail's interests include improving the performance and reliability of mobile robots through research in artificial intelligence and robot control. Over the past five years, he focused on off-road robot motion planning and navigation, and has participated in DARPA projects (PerceptOR, LAGR), as well as research projects at JPL .

CMU talk: Hamming Embedding and Weak Geometric consistency for large-scale image and video search

CMU VASC Seminar
Monday, December 8, 2008

Hamming Embedding and Weak Geometric consistency for large-scale image and video search
Herve Jegou

We address the problem of large scale image search, for which many recent methods use a bag-of-features image representation. We show the sub-optimality of such a representation for matching descriptors and derive a more precise representation based on 1) Hamming embedding (HE) and 2) weak geometric consistency constraints (WGC). HE provides binary signatures that refine the matching based on visual words. WGC filters matching descriptors that are not consistent in terms of angle and scale. Experiments performed on a dataset of one million images show a significant improvement due to our approach. This is confirmed by the Trecvid2008 video copyright detection task, where we obtained the best results in terms of accuracy for all types of transformation.

This is joint work with M. Douze and C. Schmid.

Herve Jegou holds a M.S. degree and a PhD in Computer Science from the University of Rennes. He is a former student of the Ecole Normale Superieure de Cachan. After being a post-doctoral research assistant in the INRIA TEXMEX project, he is a full-time researcher at the LEAR project-team at INRIA Rhone-Alpes, France, since 2006. His research interests concern large scale image retrieval and approximate nearest neighbor search.

CMU Thesis: Effective Motion Tracking Using Known and Learned Actuation Models

Effective Motion Tracking Using Known and Learned Actuation Models

Yang Gu
Computer Science Department
Carnegie Mellon University

Robots need to track objects. We consider tasks where robots actuate on the target that is visually tracked. Object tracking efficiency completely depends on the accuracy of the motion model and of the sensory information. The motion model of the target becomes particularly complex in the presence of multiple agents acting on a mobile target. We assume that the tracked object is actuated by a team of agents, composing of robots and possibly humans. Robots know their own actions, and team members are collaborating according to coordination plans and communicated information. The thesis shows that using a previously known or learned action model of the single robot or team members improves the efficiency of tracking.

First, we introduce and implement a novel team-driven motion tracking approach. Team-driven motion tracking is a tracking paradigm defined as a set of principles for the inclusion of a hierarchical, prior knowledge and construction of a motion model. We illustrate a possible set of behavior levels within the Segway soccer domain that correspond to the abstract motion modeling decomposition.

Second, we introduce a principled approach to incorporate models of the robot-object interaction into the tracking algorithm to effectively improve the performance of the tracker. We present the integration of a single robot behavioral model in terms of skills and tactics with multiple actions into our dynamic Bayesian probabilistic tracking algorithm.

Third, we extend to multiple motion tracking models corresponding to known multi-robot coordination plans or from multi-robot communication. We evaluate our resulting informed tracking approach empirically in simulation and using a setup Segway soccer task. The input of the multiple single and multi-robot behavioral sources allow a robot to much more effectively visually track mobile targets with dynamic trajectories.

Fourth, we present a parameter learning algorithm to learn actuation models. We describe the parametric system model and the parameters we need to learn in the actuation model. As in the KLD-sampling algorithm applied to tracking, we adapt the number of modeling particles and learn the unknown parameters. We successfully decrease the computation time of learning and the state estimation process by using significantly fewer particles on average. We show the effectiveness of learning using simulated experiments. The tracker that uses the learned actuation model achieves improved tracking performance.

These contributions demonstrate that it is possible to effectively improve an agent’s object tracking ability using tactics, plays, communication and learned action models in the presence of multiple agents acting on a mobile object. The introduced tracking algorithms are proven effective in a number of simulated experiments and setup Segway robot soccer tasks. The team-driven motion tracking framework is demonstrated empirically across a wide range of settings of increasing complexity.

Thursday, December 04, 2008

CFP: IJCAI 2009 Learning by Demonstration Challenge

IJCAI 2009 Robot Learning by Demonstration Challenge
July 13-16, 2009
Pasadena, CA, USA


The IJCAI 2009 Robot Learning by Demonstration (LbD) Challenge, held in conjunction with the International Joint Conference on Artificial Intelligence, welcomes contributions that demonstrate physically embodied robots learning a task or skill from a human teacher. This year, we aim to bring together several research/commercial groups to demonstrate complete platforms performing relevant LbD tasks. Our long-term aim is to define increasingly challenging experiments for future LbD events and greater scientific understanding of the area.

CONTRIBUTIONS can include live hardware demonstrations and/or short video clips, showcasing Learning by Demonstration abilities. Those interested in contributing should submit a 1-2 page proposal, by March 1 2009, containing the following information:

- the names and affiliation of the exhibitors;
- a summary of the objectives and methods of the underlying research;
- description of the LbD demonstration;
- citations to any relevant or supporting papers;
- if you are proposing a live hardware demonstration, a list and short description of the hardware you will be using at the Challenge.

SUBMISSION can be done online at:
Notifications of acceptance will be sent out by March 20, 2009.

TRAVEL SUPPORT may be possible for selected participants and their hardware, depending on available funds and level of demand.

MISSION: the IJCAI 2009 Challenge will serve as the foundation for more focused and commonly pursued challenges for AAAI 2010 and beyond. Please visit the Challenge website for more details: http://www.cc.gatech.edu/~athomaz/IJCAI-LbD-Exhibit/

The IJCAI 2009 Robotics site can be consulted for more information about the overall robotics events: http://robotics.cs.brown.edu/ijcai09/


Andrea Thomaz <athomaz@cc.gatech.edu>
Chad Jenkins <cjenkins@cs.brown.edu>
Monica Anderson <anderson@cs.ua.edu>

[Call for Papers] Autonomous Robots Journal Special Issue: Characterizing Mobile Robot Localization and Mapping

Autonomous Robots Journal Special Issue:
Characterizing Mobile Robot Localization and Mapping
Editors: Raj Madhavan, Chris Scrapper, and Alexander Kleiner

Stable navigation solutions are critical for mobile robots intended to operate in dynamic and unstructured environments. In the context of this special issue, stable navigation solution is taken to mean the ability of a robotic system "to sense and create internal representations of its environment and estimate pose (where pose consists of position and orientation) with respect to a fixed coordinate frame". Such competency, usually termed localization and mapping, will enable mobile robots to identify obstacles and hazards present in the environment, and maintain an estimate of where they are and where they have been. A myriad of approaches have been proposed and implemented, some with greater success than others. Since the capabilities and limitations of these approaches vary significantly depending on the requirements of the end user, the operational domain, and onboard sensor suite limitations, it is essential for developers of robotic systems to understand the performance characteristics of methodologies employed to produce a stable navigation solution.

Currently, there is no way to quantitatively measure the performance of a robot or a team of robots against user-defined requirements. Additionally, there exists no consensus on what objective evaluation procedures need to be followed to deduce the performance of various robots operating in a variety of domains. Lack of reproducible and repeatable test methods have precluded researchers working towards a common goal from exchanging and communicating results, inter-comparing robot performance, and leveraging previous work that could otherwise avoid duplication and expedite technology transfer from the "drawing board" to the field. For instance, currently, the evaluation of robotic maps is based on qualitative analysis (i.e. visual inspection). This approach does not allow for better understanding of what errors specific systems are prone to and what systems meet the needs. It has become common practice in the literature to compare newly developed mapping algorithms with former methods by presenting images of generated maps. This procedure turns out to be suboptimal, particularly when applied to large-scale maps. The absence of standardized methods for evaluating emerging robotic technologies has caused segmentation in the research and development communities. This lack of cohesion hinders the attainment of robust mobile robot navigation, in turn slowing progress in many domains, such as manufacturing, service, health care, and security. Providing the research community access to standardized tools, reference data sets, and an open-source library of navigation solutions, researchers and consumers of mobile robot technologies will be able to evaluate the cost and benefits associated with various navigation solutions.

The primary focus of this special issue is to bring together what is so far an amorphous research community to define standardized methods for the quantitative evaluation of robot localization algorithms and/or robot-generated maps. The performance characteristics of several approaches will be documented towards developing a stable navigation solution by detailing the capabilities and limitations of each approach and by the inter-comparison of experimental results, as well as the underlying mechanisms used to formulate these solutions. Through this effort, we seek to start the process, which will compile the results of these evaluations into a reference guide that documents lessons learned and the performance characteristics of various navigation solutions. This will enable end users to select the "best" possible method that meets their needs and will also lead to the development of the adaptive systems that are more technically capable and at the same time are safe thus permitting collaborative operations of man and machine.

Topics of interest include (but are not limited to):
* Characterizing navigation in complex unstructured domains & requirements imposed by dynamic nature of operating domains
* Evaluation frameworks and adaptive approaches to developing stable navigation solutions
* Probabilistic methodologies with particular attention to uncertainty in assessing robot-generated maps
* Visualization tools for assessing localization and mapping
* Methods for ground truth generation from public map sources
* Multi-robot localization and mapping
* Testing in various domains of interest ranging from manufacturing floors to urban search and rescue
* Applications with demonstrated success or lessons learnt from failures

The above topics are by no means exhaustive but are only meant to be a representative list. We particularly encourage submissions related to mobile robot field deployments, challenges encountered, and lessons learnt during such implementations. Theoretical investigations into assessing performance of robot localization and mapping algorithms are also welcome. Please contact the guest editors if you are not sure if a particular topic fits the special issue.

* Paper submission deadline: February 1, 2009
* Notification to authors: May 1, 2009
* Camera ready papers: August 1, 2009

See journal webiste at http://www.springer.com/10514
Manuscripts should be submitted to: http://AURO.edmgr.com
This online system offers easy and straightforward log-in and submission procedures, and supports a wide range of submission file formats.

Tuesday, December 02, 2008

Call for Contributions - IJCAI 2009 Mobile Manipulation Challenge

IJCAI 2009 Mobile Manipulation Challenge
July 13-16, 2009
Pasadena, CA, USA


The IJCAI 2009 Mobile Manipulation Challenge, held in conjunction with the International Joint Conference on Artificial Intelligence, welcomes contributions that demonstrate physically embodied robots performing a mobile manipulation task. This year, we aim to bring together several research/commercial groups to demonstrate complete platforms performing relevant mobile manipulation tasks. Our long-term aim is to define increasingly challenging experiments future mobile manipulation events and greater scientific understanding of the area.

AREAS OF INTEREST include (but are not limited to):

- point-and-click fetching: where human users can select various objects (possibly using a laser pointer) for a mobile robot to fetch, we invite participants to bring objects for collective use for all contributors;

- assembling structures: robot manipulators that can build larger structures by connecting smaller primitive parts;

- searching for hidden objects: search tasks that involve manipulation of occluding objects to find hidden goal object.

CONTRIBUTIONS can include live hardware demonstrations and/or short video clips, showcasing manipulation abilities as described above. Those interested in contributing should submit a 1-2 page proposal, by March 1 2009, containing the following information:

- the names and affiliation of the exhibitors;
- a summary of the objectives and methods of the underlying research;
- description of the manipulation demonstration;
- citations to any relevant or supporting papers;
- if you are proposing a live hardware demonstration, a list and short description of the hardware you will be using at the Challenge.

SUBMISSION can be done via email at the address:

Notifications of acceptance will be sent out by March 20, 2009.

TRAVEL SUPPORT may be possible for selected participants and their hardware, depending on available funds and level of demand.

MISSION: the IJCAI 2009 Challenge will serve as the foundation for more focused and commonly pursued challenges for AAAI 2010 and beyond. Please visit the Challenge website for more details:

The IJCAI 2009 Robotics site can be consulted for more information about the overall robotics events:


Matei Ciocarlie <cmatei@cs.columbia.edu>
Radu Bogdan Rusu <rusu@cs.tum.edu>
Chad Jenkins <cjenkins@cs.brown.edu>
Monica Anderson <anderson@cs.ua.edu>

CMU talk: A Hierarchical Image Analysis for Extracting Parking Lot Structure from Aerial Image.

A Hierarchical Image Analysis for Extracting Parking Lot Structure from Aerial Image.

Young-Woo Seo
Ph.D Student
Robotics Institute
Carnegie Mellon University

Thursday, December 4th

The road network information simplify autonomous driving by providing strong priors on driving environments for planning and perception. It tells a robotic vehicle where it can drive and provides contextual cues that inform the driving behavior. For example, this information lets the robotic vehicle know information about upcoming intersections (e.g. that the intersection is a four-way stop and that the robot must conform to precedence rules) and other fixed rules of the road (e.g. speed limits). Currently the road network information about driving environments is manually generated using a combination of GPS survey and aerial imagery. These techniques for converting digital imagery into road network information are labor intensive, reducing the benefit provided by digital maps. To fully exploit the benefits of digital imagery, these processes should be automated. As a step toward this goal, we present a machine learning algorithm that extracts the structure of parking lot from a given aerial image. We approach this problem hierarchically from low-level image analysis through high-level structure inference. We test three different methods and their combinations. From the experimental results, our Markov Random Fields implementation outperforms other methods in terms of false negative and positive rates.

Monday, December 01, 2008

Lab Meeting December 8th, 2008 (Jeff):Topological mapping, localization and navigation using image collections

Title: Topological mapping, localization and navigation using image collections

Authors: Friedrich Fraundorfer, Christopher Engels, and David Nister


In this paper we present a highly scalable vision based localization and mapping method using image collections. A topological world representation is created online during robot exploration by adding images to a database and maintaining a link graph. An efficient image matching scheme allows real-time mapping and global localization. The compact image representation allows us to create image collections containing millions of images, which enables mapping of very
large environments. A path planning method using graph search is proposed and local geometric information is used to navigate in the topological map. Experiments show the good performance
of the image matching for global localization and demonstrate path planning and navigation.


Lab Meeting December 8st (Casey): The Painful Face - Pain Expression Recognition Using Active Appearance Models

Title: The Painful Face - Pain Expression Recognition Using Active Appearancne Models (ICMI'07)
Authors: Ahmed Bilal Ashraf, Simon Lucey, Jeffer F. Cohn, Tsuhan Chen, Zara Ambadar(CMU), Ken Prkachin, Patty Solomon, Barry-John Theobald

Pain is typically assessed by patient self-report. Self-reported pain, however, is difficult to interpret and may be impaired or not even possible, as in young children or severely ill. Behavioral scientists have identified reliable and valid facial indicator of pain. Until now they required manual measurement by highly skilled observers. We developed an approach that automatically recognizes acute pain. Adult patients with rotator cuff injury were video-recorded while a physiotherapist manipulated their affected and unaffected shoulder. Skilled observers rated pain expression from the video on a 5-point Likert-type scale. From these ratings, sequences were categorized as no-pain(rating of 0), pain(rating of 3,4, or 5), and indeterminate(rating of 1 or 2). We explored machine learning approaches for pain-no pain classification. Active Appearance Models(AAM) were used to decouple shape and appearance parameters from the digitized face images. Support vector machines (SVM) were used with several representations from the AAM. Using a leave-one-out procedure, we achieved an equal error rate of 19%(hit rate=81%) using canonical appearance and shape features. These findings suggest the feasibility of automatic pain detection from video.

Saturday, November 29, 2008

CMU talk: 3-D Point Cloud Classification with Max-Margin Markov Networks

Speaker: Daniel Munoz (RI@CMU)
Venue: NSH 1507
Date: Monday, December 1, 2008

Title: 3-D Point Cloud Classification with Max-Margin Markov Networks

Point clouds extracted from laser range finders are hard to classify due to variable and noisy returns due to pose, occlusions, surface reflectance, and sensor type. Conditional Random Fields (CRFs) is a popular framework for performing contextual classification that produce improved and "smooth" classification over local classifiers. In this talk, I will present some recent extensions to the max-margin CRF model from Taskar et al. 2004 that is used in this application.

Friday, November 28, 2008

Lab Meeting Dezember 1st (Andi): Probabilistic Scheme for Laser Based Motion Detection

Authors: Roman Katz, Juan Nieto and Eduardo Nebot

Abstract—This paper presents a motion detection scheme using laser scanners mounted on a mobile vehicle. We propose a stable, yet simple motion detection scheme that can be used and improved with tracking and classification procedures. The salient contribution of the developed architecture is twofold. It proposes a spatio-temporal correspondence procedure based on a scan registration algorithm. The detection is cast as a probability decision problem that accounts for sensor noise and achieves robust classification. Probabilistic occlusion checking is finally performed to improve robustness. Experimental results show the performance of the proposed architecture under different settings in urban environments.

full paper

Tuesday, November 25, 2008

Lab Meeting December 1st, 2008 (Jimmy): Negative Information and Line Observations for Monte Carlo Localization

Title: Negative Information and Line Observations for Monte Carlo Localization

Authors: Todd Hester and Peter Stone

Localization is a very important problem in robotics and is critical to many tasks performed on a mobile robot. In order to localize well in environments with few landmarks, a robot must make full use of all the information provided to it. This paper moves towards this goal by studying the effects of incorporating line observations and negative information into the localization algorithm. We extend the general Monte Carlo localization algorithm to utilize observations of lines such as carpet edges. We also make use of the information available when the robot expects to see a landmark but does not, by incorporating negative information into the algorithm. We compare our implementations of these ideas to previous similar approaches and demonstrate the effectiveness of these improvements through localization experiments performed both on a Sony AIBO ERS-7 robot and in simulation.


Monday, November 24, 2008

CMU talkl: Machine Learning Problems in Computational Biology

Speaker: Eric Xing (Assistant Professor, ML@CMU)
Date: Monday, November 24, 2008

Some Challenging Machine Learning Problems in Computational Biology:
Time-Varying Networks Inference and Sparse Structured Input-Out Learning

Recent advances in high-throughput technologies such as microarrays and genome-wide sequencing have led to an avalanche of new biological data that are dynamic, noisy, heterogeneous, and high-dimensional. They have raised unprecedented challenges in machine learning and high-dimensional statistical analysis; and their close relevance to human health and social welfare has often created unique demands on performance metric different from standard data mining or pattern recognition problems. In this talk, I will discuss two of such problems. First, I will present a new statistical formalism for modeling network evolution over time, and several new algorithms based on temporal extensions of the sparse graphical logistic regression, for parsimonious reverse-engineering the latent time varying networks. I will show some promising results on recovering the latent sequence of temporally rewiring gene networks over more than 4000 genes during the life cycle of Drosophila melanogaster from microarray time course, at a time resolution only limited by sample frequency. Second, I will present a family of sparse structured regression models in the context of uncovering true associations between linked genetic variations (inputs) in the genome and networks of human traits (outputs) in the phenome. If time allows, I will also present another class of new models known as the maximum entropy discrimination Markov networks, which address the same problem in the maximum margin paradigm, but using a entropic regularizer that lead to a distribution of structured prediction functions that are simultaneously primal and dual sparse (i.e., with few support vectors, and of low effective feature dimension).

Joint work with Amr Ahmed, Seyoung Kim, Mladen Kolar, Le Song and Jun Zhu.

Thursday, November 20, 2008

CMU talk: The Capacity and Fidelity of Visual Long Term Memory

VASC Seminar
Monday, November 24, 2008

The Capacity and Fidelity of Visual Long Term Memory

Aude Oliva
Associate Professor of Cognitive Science
Department of Brain and Cognitive Sciences
Massachusetts Institute of Technology


The human visual system has been extensively trained to deal with objects and natural images, giving it the opportunity to develop robust strategies to quickly encode and recognize categories and exemplars. Although it is known that human memory capacity for images is massive, the fidelity with which human memory can represent such a large number of images is an outstanding question. We conducted three large-scale memory experiments to determine the details remembered per image representing object and natural scenes, by varying the amount of detail required to succeed in subsequent memory tests. Our results show that contrary to the commonly accepted view that long-term memory representations contain only the gist of what was seen, long-term memory can store thousands of items with a large amount of detail per item. Further analyzes reveal that memory for an item depends on the extent to which it is conceptually distinct from other items in the memory set, and not necessarily on the featural distinctiveness along shape or color dimensions. These findings suggest a “conceptual hook” is necessary for maintaining a large number of high-fidelity representations in visual long-term memory. Altogether, the results present a great challenge to models of object and natural scene recognition, which must be able to account for such a large and detailed storage capacity. Work in collaboration with: Timothy Brady, Talia Konkle and George Alvarez.

Aude Oliva is Associate Professor of Cognitive Science, in the Department of Brain and Cognitive Sciences, at the Massachusetts Institute of Technology. After a French baccalaureate in Physics and Mathematics and a B.Sc in Psychology, she received two M. Sc. degrees –in Experimental Psychology, and in Cognitive Science and Image Processing, and was awarded a Ph.D in Cognitive Science in 1995, from the Institut National Polytechnique of Grenoble, France. After postdoctoral research positions in the UK, Japan, France and US, she joined the MIT faculty in 2004. In 2006, she received a National Science Foundation CAREER award in Computational Neuroscience to pursue research in human and machine scene understanding.

Her research program is in the field of Computational Visual Cognition, a framework that strives to identify the substrates of complex visual and recognition tasks (using behavioral, eye tracking and imaging methods) and to develop models inspired by human cognition. Her current research focus lies in studying human abilities at natural image recognition and memory, including scene, object and space perception as well as the role of attentional mechanisms and learning in visual search tasks.

Wednesday, November 19, 2008

CMU RI Thesis Proposal: Probabilistic Reasoning with Permutations

Probabilistic Reasoning with Permutations: A Fourier-Theoretic Approach

Robotics Institute
Carnegie Mellon University

Permutations are ubiquitous in many real-world problems, such as voting, ranking, and data association. Representing uncertainty over permutations is challenging, since there are n! possibilities, and common factorized probability distribution representations, such as graphical models, are inefficient due to the mutual exclusivity constraints that are typically associated with permutations. 

This thesis explores a new approach for probabilistic reasoning with permutations based on the idea of approximating distributions using their low-frequency Fourier components. We use a generalized Fourier transform defined for functions on permutations, but unlike the widely used Fourier analysis on the circle or the real line, Fourier transforms of functions on permutations take the form of ordered collections of matrices. As we show, maintaining the appropriate set of low-frequency Fourier terms corresponds to maintaining matrices of simple marginal probabilities which summarize the underlying distribution. We show how to derive the Fourier coefficients of a variety of probabilistic models which arise in practice and that many useful models are either well-approximated or exactly represented by low-frequency (and in many cases, sparse) Fourier coefficient matrices. 

In addition to showing that Fourier representations are both compact and intuitive, we show how to cast common probabilistic inference operations in the Fourier domain, including marginalization, conditioning on evidence, and factoring based on probabilistic independence. The algorithms presented in this thesis are fully general and work gracefully in bandlimited settings where only a partial subset of Fourier coefficients is made available. 

From the theoretical side, we tackle several problems in understanding the consequences of the bandlimiting approximation. We present results in this thesis which illuminate the nature of error propagation in the Fourier domain and propose methods for mitigating their effects. 

Finally we demonstrate the effectiveness of our approach on several real datasets and show that our methods, in addition to being well-founded theoretically, are also scalable and provide superior results in practice.

Lab Meeting Novembel 24, 2008(ZhenYu): Reconstructing a 3D Line from a Single Catadioptric Image

Title: Reconstructing a 3D Line from a Single Catadioptric Image (3DPVT'06)

Authors: Lanman, Douglas; Wachs, Megan; Taubin, Gabriel; Cukierman, Fernando

This paper demonstrates that, for axial non-central optical systems, the equation of a 3D line can be estimated using only four points extracted from a single image of the line. This result, which is a direct consequence of the lack of vantage point, follows from a classic result in enumerative geometry: there are exactly two lines in 3-space which intersect four given lines in general position. We present a simple algorithm to reconstruct the equation of a 3D line from four image points. This algorithm is based on computing the Singular Value Decomposition (SVD) of the matrix of Pl¨ucker coordinates of the four corresponding rays. We evaluate the conditions for which the reconstruction fails, such as when the four rays are nearly coplanar. Preliminary experimental results using a spherical catadioptric camera are presented. We conclude by discussing the limitations imposed by poor calibration and numerical errors on the proposed reconstruction algorithm.


Lab Meeting Novembel 24, 2008(Chung-Han)SwisTrack - A Flexible Open Source Tracking Software for Multi-Agent Systems

Title: SwisTrack - A Flexible Open Source Tracking Software for Multi-Agent Systems

Authors: Thomas Lochmatter, Pierre Roduit, Chris Cianci, Nikolaus Correll, Jacques Jacot and Alcherio Martinoli

Vision-based tracking is used in nearly all roboticlaboratories for monitoring and extracting of agent positions,orientations, and trajectories. However, there is currently noaccepted standard software solution available, so many researchgroups resort to developing and using their own customsoftware. In this paper, we present Version 4 of SwisTrack,an open source project for simultaneous tracking of multipleagents. While its broad range of pre-implemented algorithmiccomponents allows it to be used in a variety of experimentalapplications, its novelty stands in its highly modular architecture.Advanced users can therefore also implement additionalcustomized modules which extend the functionality of theexisting components within the provided interface. This paperintroduces SwisTrack and shows experiments with both markedand marker-less agents.


Tuesday, November 18, 2008

CMU talk: Visual Localisation in Dynamic Non-uniform Lighting

Visual Localisation in Dynamic Non-uniform Lighting

Dr. Stephen Nuske
Postdoctoral Researcher
Field Robotics Center
Carnegie Mellon University

Thursday, November 20th

Abstract: For vision to succeed as a perceptual mechanism in general field robotic applications, vision systems must overcome the challenges presented by the lighting conditions. Many current approaches rely on decoupling the effects of lighting from the process, which is not possible in many situations -- not surprising considering an image is fundamentally an array of light measurements. This talk will describe two different visual localisation systems designed for two different field robot applications and were both designed to address the lighting challenges in their respective application environments.

The first visual localisation system discussed is for industrial ground vehicles operating outdoors. The system employs an invariant map combined with a robust localisation algorithm and an intelligent exposure control algorithm which together permit reliable localisation in a wide range of outdoor lighting conditions.

The second system discussed is for submarines navigating underwater structures, where the only light source is a spotlight mounted onboard the vehicle. The proposed system explicitly models the light source within the localisation framework which serves to predict the changing appearance of the structure. Experiments reveal that this system that understands the effects of the lighting can solve this difficult visual localisation scenario which conventional approaches struggle to solve.

The results of the two systems are encouraging, given the extremely challenging dynamic non-uniform lighting in each environment and both systems will continue to be developed with industry partners into the future.

Speaker Bio: Stephen's research is in vision systems for mobile robots, focusing on the creation of practical systems that can deal with the problems arising from dynamic non-uniform lighting conditions. Stephen began his undergraduate studies at the University of Queensland, Australia, in Software Engineering. His undergraduate thesis was on the vision system for the university's robot soccer team that placed second at the RoboCup in Portugal. During his undergraduate years he gained work experience at BSD Robotics; a company that develops equipment for automated medical laboratories. After receiving his undergraduate degree Stephen began a PhD based at the Autonomous Systems Laboraty at CSIRO in Australia. He has spent three months during his PhD at INRIA in Grenoble; a French national institute for computer science. Stephen is now starting a position here at CMU in the Field Robotics Center under Sanjiv Singh.

Lab Meeting November 24, 2008(Tiffany): Structure from Behavior in Autonomous Agents

Structure from Behavior in Autonomous Agents (IROS 2008)

Georg Martius, Katja Fiedler and J. Michael Herrmann

We describe a learning algorithm that generates behaviors by self-organization of sensorimotor loops in an autonomous robot. The behavior of the robot is analyzed by a multi-expert architecture, where a number of controllers compete for the data from the physical robot. Each expert stabilizes the representation of the acquired sensorimotor mapping in dependence of the achieved prediction error and forms eventually a behavioral primitive. The experts provide a discrete representation of the behavioral manifold of the robot and are suited to form building blocks for complex behaviors.


Saturday, November 15, 2008

CMU talk: Learning Language from its Perceptual Context

Joint Intelligence/LTI Seminar
November 21, 2008

Learning Language from its Perceptual Context
Raymond J. Mooney, University of Texas at Austin

Current systems that learn to process natural language require laboriously constructed human-annotated training data. Ideally, a computer would be able to acquire language like a child by being exposed to linguistic input in the context of a relevant but ambiguous perceptual environment. As a step in this direction, we present a system that learns to sportscast simulated robot soccer games by example. The training data consists of textual human commentaries on Robocup simulation games. A set of possible alternative meanings for each comment is automatically constructed from game event traces. Our previously developed systems for learning to parse and generate natural language (KRISP and WASP) were augmented to learn from this data and then commentate novel games. The system is evaluated based on its ability to parse sentences into correct meanings and generate accurate descriptions of game events. Human evaluation was also conducted on the overall quality of the generated sportscasts and compared to human-generated commentaries.

Raymond J. Mooney is a Professor in the Department of Computer Sciences at the University of Texas at Austin. He received his Ph.D. in 1988 from the University of Illinois at Urbana/Champaign. He is an author of over 150 published research papers, primarily in the areas of machine learning and natural language processing. He is the current President of the International Machine Learning Society, was program co-chair for the 2006 AAAI Conference on Artificial Intelligence, general chair of the 2005 Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing, and co-chair of the 1990 International Conference on Machine Learning. He is a Fellow of the American Association for Artificial Intelligence and recipient of best paper awards from the National Conference on Artificial Intelligence, the SIGKDD International Conference on Knowledge Discovery and Data Mining, the International Conference on Machine Learning, and the Annual Meeting of the Association for Computational Linguistics. His recent research has focused on learning for natural-language processing, text mining for bioinformatics, statistical relational learning, and transfer learning.

Friday, November 14, 2008

CMU talk: Rain in Vision and Graphics

Special VASC Seminar
Tuesday, November 18, 2008

Rain in Vision and Graphics
Kshitiz Garg

Rain produces sharp intensity fluctuations in images and videos which severely degrade the performance of outdoor vision systems. Considering that bad weather is common, a city like New York has bad weather 23% of time, it is important to remove the visual effects of rain to make outdoor vision robust. In contrast, in graphics, rain effects are desirable. They are often used in movies to convey scene emotions and in other graphics applications, such as games, to enhance realism. In this talk, I will present rain from the perspective of vision and graphics. I will show how physics based modeling of the visual appearance of rain leads to efficient algorithms both for handling its effects in vision and for its realistic rendering in graphics. I will also briefly discuss some of the recent projects I have done on recognition and tracking at intuVision.

Kshitiz Garg is a research scientist and software developer at intuVision. His research interests are in the areas of computer vision, pattern recognition and computer graphics. He has a Masters in Physics and a PhD. in Computer Science from Columbia University, NY. He specializes in physics-based modeling and algorithm development. During his graduate work he developed physics based models for the intensity fluctuations produced by rain in images. He is also interested in Computer Graphics and has developed efficient algorithms for realistic rendering of rain. Since joining the intuVision team, he has worked on algorithms to improve object tracking and recognition especially in the presence of background motion, illumination changes and shadows. He is the research lead for development of intuVision's object classification, face detection, and soft biometry algorithms.

Thursday, November 13, 2008

CMU talk: Techniques for Learning 3D Maps

Title: Techniques for Learning 3D Maps

Dr. Wolfram Burgard
Dept. of Computer Science
University of Freiburg

Monday, November 17th

Abstract: Learning maps is a fundamental aspect in mobile robotics, as maps support various tasks including path planning and localization. Whereas the problem of learning maps has been extensively studied for indoor settings, novel field robotics projects have substantially increased the interest in effective representations of outdoor environments. In this talk, we will present our recent results in learning highly accurate multi-level surface maps, which are an extension of elevation maps towards multiple levels. We will describe how multi-level surface maps can be utilized for motion planning and localization. We present an application, in which Junior, the DARPA Grand Challenge entry robot of Stanford University, autonomously drives through a large parking garage and carries out an autonomous parking maneuver. Finally, we will briefly describe our approaches to learning surface maps using variants of Gaussian Processes.

Speaker Bio: Wolfram Burgard is an associate professor for computer science at the University of Freiburg where he heads of the Laboratory for Autonomous Intelligent Systems. He received his Ph.D.~degree in Computer Science from the University of Bonn in 1991. His areas of interest lie in artificial intelligence and mobile robots. Over the past years his research mainly focused on the development of robust and adaptive techniques for state estimation and control of autonomous mobile robots. He and his group developed several innovative probabilistic techniques for robot navigation and control. They cover different aspects such as localization, map-building, path-planning, and exploration.

Tuesday, November 11, 2008

CMU RI Thesis Proposal: Geolocation from Range: Robustness, Efficiency and Scalability

Robotics Institute
Carnegie Mellon University

In this thesis I explore the topic of geolocation from range. A robust method for localization and SLAM (Simultaneous Localization and Mapping) is proposed. This method uses a polar parameterization of the state to achieve accurate estimates of the nonlinear and multi-modal distributions in range-only systems. Several experimental evaluations on real robots reveal the reliability of this method. 

Scaling such a system to large network of nodes, increases the computational load on the system due to the increased state vector. To alleviate this problem, we propose the use of a distributed estimation algorithm based on the belief propagation framework. This method distributes the estimation task, such that each node only estimates its local network, greatly reducing the computation performed by any individual node. However, the method does not provide any guarantees on the convergence of its solution in general graphs. Convergence is only guaranteed for non-cyclic graphs (ie. trees). Thus, I propose to formulate an extension to this approach that provides guarantees on its convergence and an improved approximation of the true graph inference problem

Scaling in the traditional sense involves extensions to deal with growth in the size of the operating environment. In large, feature-less environments, maintaining a globally consistent estimate of a group of mobile agents is difficult. In this thesis, I propose the use of a multi-robot coordination strategy to achieve the tight coordination necessary to obtain an accurate global estimate. The proposed approach will be demonstrated using both simulation and experimental testing with real robots.

Monday, November 10, 2008

Lab Meeting November 10, 2008 (Yu-chun): “Try something else!” — When users change their discursive behavior in human-robot interaction

ICRA 2008

Manja Lohse, Katharina J. Rohlfing, Britta Wrede, and Gerhard Sagerer

This paper investigates the influence of feedback provided by an autonomous robot (BIRON) on users' discursive behavior. A user study is described during which users show objects to the robot. The results of the experiment indicate, that the robot's verbal feedback utterances cause the humans to adapt their own way of speaking. The changes in users' verbal behavior are due to their beliefs about the robots knowledge and abilities. In this paper they are identified and grouped. Moreover, the data implies variations in user behavior regarding gestures. Unlike speech, the robot was not able to give feedback with gestures. Due to the lack of feedback, users did not seem to have a consistent mental representation of the robot's abilities to recognize gestures. As a result, changes between different gestures are interpreted to be unconscious variations accompanying speech.

Sunday, November 09, 2008

Lab Meeting November 10, 2008 (Alan): An image-to-map loop closing method for monocular SLAM (IROS 2008)

Title: An image–to–map loop closing method for monocular SLAM
Authors: Brian Williams, Mark Cummins, Jos´e Neira, Paul Newman, Ian Reid and Juan Tard´os

Abstract: In this paper we present a loop closure method for a handheld single–camera SLAM system based on our previous work on relocalisation. By finding correspondences between the
current image and the map, our system is able to reliably detect loop closures. We compare our algorithm to existing techniques for loop closure in single–camera SLAM based on both image–
to–image and map–to–map correspondences and discuss both the reliability and suitability of each algorithm in the context of monocular SLAM.


Saturday, November 08, 2008

Lab Meeting November 10, 2008 (Any): Efficiently Learning High-dimensional Observation Models for Monte-Carlo Localization using Gaussian Mixtures

Title: Efficiently Learning High-dimensional Observation Models for Monte-Carlo Localization using Gaussian Mixtures
Authors: Patrick Pfaff, Cyrill Stachniss, Christian Plagemann, and Wolfram Burgard
Abstract: Whereas probabilistic approaches are a powerful tool for mobile robot localization, they heavily rely on the proper definition of the so-called observation model which defines the likelihood of an observation given the position and orientation of the robot and the map of the environment. Most of the sensor models for range sensors proposed in the past either consider the individual beam measurements independently or apply uni-modal models to represent the likelihood function. In this paper, we present an approach that learns place-dependent sensor models for entire range scans using Gaussian mixture models. To deal with the high dimensionality of the measurement space, we utilize principle component analysis for dimensionality reduction. In practical experiments carried out with data obtained from a real robot, we demonstrate that our model substantially outperforms existing and popular sensor models.

Friday, November 07, 2008

A $1 Recognizer for User Interface Prototypes

It requires under 100 lines of easy code and achieves 97% recognition rates with only one template defined for each gesture below. With 3+ templates defined, accuracy exceeds 99%. Gestures should be regarded as fully rotation, scale, and position invariant.

CMU VASC Seminar: What does the sky tell us about the camera?

What does the sky tell us about the camera?
Jean-Francois Lalonde
Robotics Institute, Carnegie Mellon

VASC Seminar
Monday, November 10

Abstract: As the main observed illuminant outdoors, the sky is a rich source of information about the scene. However, it is yet to be fully explored in computer vision because its appearance in an image depends on the sun position, weather conditions, photometric and geometric parameters of the camera, and the location of capture. In this talk, I will present an analysis of two sources of information available within the visible portion of the sky region: the sun position, and the sky appearance. By fitting a model of the predicted sun position to an image sequence, we show how to extract camera parameters such as the focal length, and the zenith and azimuth angles. Similarly, we show how we can extract the same parameters by fitting a physically-based sky model to the sky appearance. In short, the sun and the sky serve as geometric calibration targets, which can be used to annotate a large database of image sequences. We use our methods to calibrate 22 real, low-quality webcam sequences scattered throughout the continental US, and show deviations below 4% for focal length, and 3 degrees for the zenith and azimuth angles. Once the camera parameters are recovered, we use them to define a camera-invariant sky appearance model, which we exploit in two applications: 1) segmentation of the sky and cloud layers, and 2) data-driven sky matching across different image sequences based on a novel similarity measure defined on sky parameters. This measure, combined with a rich appearance database, allows us to model a wide range of sky conditions.

Bio: Jean-Francois Lalonde received his B.E. in Computer Engineering from Laval University, Canada in 2004. He received his M.S. in Robotics from Carnegie Mellon University in 2006 under Martial Hebert, and he has been a Robotics Ph.D. student advised by Alexei A. Efros in that institution since. His research interests are in computer vision and computer graphics, focusing on image understanding and synthesis
leveraging large amounts of data.

Wednesday, November 05, 2008

CMU talk: Computing with Language and Context over Time

Speaker: Gregory Aist, Arizona State University

Title: Computing with Language and Context over Time

What: Joint LTI/RI Seminar
When: Friday November 7, 2008, 2:00pm - 3:00pm
Where: 1305 NSH

How do language and context interact in learning and performance by humans and machines? To explore this broad area of inquiry, I have studied interactions between natural language and a wide range of different contexts: visual context, social and team context, written context and world knowledge, procedure and task context, dialogue and temporal context, and instructional context. Specific research questions have included how machines can process spoken language continuously and integrate speech and visual context during understanding; how computers can help pilots and astronauts learn and perform tasks; and how to automatically generate, present, and evaluate the effects of vocabulary help for children. One key challenge in addressing all of these questions is to model and compute representations of language and context that unfold over time as the interaction progresses. This talk will illustrate the need for such interactive time-sensitive processes, describe computational approaches to understanding language and context as dialogue and interactions unfold across time, and evaluate the effectiveness of such approaches.

Short bio:
Gregory Aist is currently at Arizona State University as an Assistant Research Professor in the School of Computing and Informatics and the Applied Linguistics Program. His research interests are in natural language processing and computer-assisted learning. His research addresses fundamental issues in language and learning, tackles computational challenges of automatic processing of human language and computer support for human learning, and is applied to provide users with learning experiences and new capabilities in authentic settings for educational domains such as traditional literacy (reading and writing) and new literacies (virtual worlds), and physical domains such as aerospace and human-robot interaction. During summers 2007 and 2008 he was an Air Force Summer Faculty Fellow. Previously he has held research and visiting positions at the University of Rochester, RIACS/NASA Ames Research Center, and the MIT Media Lab. He received a Ph.D. in Language and Information Technology from Carnegie Mellon University in 2000, where he was an NSF Graduate Fellow.

Sunday, November 02, 2008

Lab Meeting November 3rd, 2008 (swem): Learning Patch Correspondences for Improved Viewpoint Invariant Face Recognition

Title: Learning Patch Correspondences for Improved Viewpoint Invariant Face Recognition

Author: Ahmed Bilal Ashraf, Simon Lucey, Tsuhan Chen

Variation due to viewpoint is one of the key challenges
that stand in the way of a complete solution to the face
recognition problem. It is easy to note that local regions of
the face change differently in appearance as the viewpoint
varies. Recently, patch-based approaches, such as those of
Kanade and Yamada, have taken advantage of this effect resulting
in improved viewpoint invariant face recognition. In
this paper we propose a data-driven extension to their approach,
in which we not only model how a face patch varies
in appearance, but also how it deforms spatially as the viewpoint
varies. We propose a novel alignment strategy which
we refer to as “stack flow” that discovers viewpoint induced
spatial deformities undergone by a face at the patch level.
One can then view the spatial deformation of a patch as
the correspondence of that patch between two viewpoints.
We present improved identification and verification results
to demonstrate the utility of our technique.


Lab Meeting November 3rd, 2008 (Shao-Chen): Blind spatial subtraction array with independent component analysis for hands-free speech recognition

Blind spatial subtraction array with independent component analysis for hands-free speech recognition

Yu Takahashi, Tomoya Takatani, Hiroshi Saruwatari and Kiyohiro Shikano

In this paper, we propose a new blind spatial subtraction array (BSSA) which contains an accurate noise estimator based on independent component analysis (ICA) to realize a noise-robust hands-free speech recognition. First, a preliminary experiment suggests that the conventional ICA is proficient in the noise estimation rather than the direct speech estimation in real environments, where the target speech can be approximated to a point source but real noises are often not point sources. Secondly, based on the above-mentioned findings, we propose a new noise reduction method which is implemented in subtracting the power spectrum of the estimated noise by ICA from the power spectrum of noise-contaminated observations. This architecture provides us with a noise-estimation-error robust speech enhancement which is well applicable to the speech recognition. Finally, the effectiveness of the proposed BSSA is shown in the speech recognition experiment.


Sunday, October 26, 2008

Lab Meeting October 27th, 2008 (slyfox): Bearings-Only Tracking of Manoeuvring Targets Using Particle Filters


We investigate the problem of bearings-only tracking of manoeuvring targets using particle filters (PFs). Three different (PFs) are proposed for this problem which is formulated as a multiple model tracking problem in a jumpMarkov system (JMS) framework. The proposed filters are (i) multiple model PF (MMPF), (ii) auxiliary MMPF (AUX-MMPF), and (iii) jump Markov system PF (JMS-PF). The performance of these filters is compared with that of standard interacting multiple model (IMM)-based trackers such as IMM-EKF and IMM-UKF for three separate cases: (i) single-sensor case, (ii) multisensor case, and (iii) tracking with hard constraints. A conservative CRLB applicable for this problem is also derived and compared with the RMS error performance of the filters. The results confirm the superiority of the PFs for this difficult nonlinear tracking problem.

EURASIP Journal on Applied Signal Processing, 2004

Friday, October 24, 2008

Lab Meeting Octobor 27th, 2008 (fish60): Smooth Nearness-Diagram Navigation


This paper presents a new method for reactive collision avoidance for mobile robots in complex and cluttered environments. Our technique is to adapt the "divide and conquer" approach of the Nearness-Diagram+ Navigation (ND+) method to generate a single motion law which applies for all navigational situations.
The resulting local path planner considers all the visible obstacles surrounding the robot, not just the closest two. With these changes our new navigation method generates smoother motion while avoiding obstacles. Results from comparisons with ND+ are presented as are experiments using Erratic mobile robots.

2008 IROS Paper

Tuesday, October 21, 2008

Lab Meeting Octobor 27th, 2008 (Jeff):Incremental vision-based topological SLAM

Title: Incremental vision-based topological SLAM

Authors: Adrien Angeli, Stephane Doncieux, Jean-Arcady Meyer, and David Filliat


In robotics, appearance-based topological map building consists in infering the topology of the environment explored by a robot from its sensor measurements. In this paper, we propose a vision-based framework that considers this data association problem from a loop-closure detection perspective in order to correctly assign each measurement to its location. Our approach relies on the visual bag of words paradigm to represent the images and on a discrete Bayes filter to compute the probability of loop-closure. We demonstrate the efficiency of our solution by incremental and real-time consistent map building in an indoor environment and under strong perceptual aliasing conditions using a single monocular wide-angle camera.

IROS2008 Paper

Monday, October 20, 2008

Lab Meeting October 20, 2008 (Jimmy): Learning in Dynamic Environments with Ensemble Selection for Autonomous Outdoor Robot Navigation

Title: Learning in Dynamic Environments with Ensemble Selection for Autonomous Outdoor Robot Navigation (IROS2008)

Authors: Michael J. Procopio, Jane Mulligan, and Greg Grudic

Autonomous robot navigation in unstructured outdoor environments is a challenging area of active research. The navigation task requires identifying safe, traversable paths which allow the robot to progress toward a goal while avoiding obstacles. Machine learning techniques—in particular, classifier ensembles—are well adapted to this task, accomplishing near-to-far learning by augmenting near-field stereo readings in order to identify safe terrain and obstacles in the far field. Composition of the ensemble and subsequent combination of model outputs in this dynamic problem domain remain open questions. Recently, Ensemble Selection has been proposed as a mechanism for selecting and combining models from an existing model library and shown to perform well in static domains. We propose the adaptation of this technique to the time-evolving data associated with the outdoor robot navigation domain. Important research questions as to the composition of the model library, as well as how to combine selected models’ outputs, are addressed in a two-factor experimental evaluation. We evaluate the performance of our technique on six fully labeled datasets, and show that our technique outperforms memoryless baseline techniques that do not leverage past experience.


Sunday, October 19, 2008

Lab Meeting October 20, 2008(Casey): 3D Head tracking and pose-robust 2D Texture Map-Based Face Recognition using a Simple Ellipsoid Model

Title: 3D Head tracking and pose-robust 2D Texture Map-based Face Recognition using a Simple Ellipsoid Model (IROS2008)

Authors: Kwang Ho An and Myung Jin Chung

A human face provides a variety of different communicative functions such as identification, the perception of emotional expression, and lip-reading. For these reasons, many applications in robotics require tracking and recognizing a human face. A novel face recognition system should be able to deal with various changes in face images, such as pose, illumination, and expression, among which pose variation is the most difficult one to deal with. Therefore, face registration (alignment) is the key of robust face recognition. If we can register face images into frontal views, the recognition task would be much easier. To align a face image into a canonical frontal view, we need to know the pose information of a human head. Therefore, in this paper, we propose a novel method for modeling a human head as a simple 3D ellipsoid. And also, we present 3D head tracking and pose estimation methods using the proposed ellipsoidal model. After recovering full motion of the head, we can register face images with pose variations into stabilized view images which are suitable for frontal face recognition. By doing so, simple and efficient frontal face recognition can be easily carried out in the stabilized texture map space instead of the original input image space. To evaluate the feasibility of the proposed approach using a simple ellipsoid model, 3D head tracking experiments are carried out on 45 image sequences with ground truth from Boston University, and several face recognition experiments are conducted on our laboratory database and the Yale Face Database B by using subspace-based face recognition methods such as PCA, PCA+LAD, and DCV.

Saturday, October 18, 2008

MIT CSAIL talk: Modeling Appearance via the Object Class Invariant

Modeling Appearance via the Object Class Invariant
Speaker: Matthew Toews, Harvard Medical School

Date: Friday, October 17 2008
Time: 2:00PM to 3:00PM
Host: Polina Golland, CSAIL

As humans, we are able to identify, localize, describe and classify a wide range of object classes, such as faces, cars or the human brain, by their appearance in images. Designing a general computational model of appearance with similar capabilities remains a long standing goal in the research community. A major challenge is effectively coping with the many sources of variability operative in determining image appearance: illumination, noise, unrelated clutter, occlusion, sensor geometry, natural intra-class variation and abnormal variation due to pathology to name a few. Explicitly modeling sources of variability can be computationally expensive, can lead to domain-specific solutions and may ultimately be unnecessary for the computational tasks at hand.

In this talk, I will show how appearance can instead be modeled in a manner invariant to nuisance variations, or sources of variability unrelated to the tasks at hand. This is done by relating spatially localized image features (e.g. SIFT) to an object class invariant (OCI), a reference frame which remains geometrically consistent with the underlying object class despite nuisance variations. The resulting OCI model is a probabilistic collage of local image patterns that can be automatically learned from sets of images and robustly fit to new images, with little or no manual supervision. Due to its general nature, the OCI model can be used to address a variety of difficult, open problems in the contexts of computer vision and medical image analysis. I will show how the model can be used both as a viewpoint-invariant model of 3D object classes in photographic imagery and as a robust anatomical atlas of the brain in magnetic resonance imagery.

Thursday, October 16, 2008

IROS 2008 Keynote speech: Understanding Human Faces

At IROS 2008, Takeo Kanade delivered a great speech summarizing what he has being working on in terms of understanding human faces. Below is the abstract:

A human face conveys important information: identity, emotion, and intention of the person. Technologies to process and understand human faces have many applications, ranging from biometrics to medical diagnosis, and from surveillance to human-robot interaction. This talk will give an overview of the recent progress that the CMU Face Group has made, in particular, robust face alignment, facial Action Unit (AU) recognition for emotion analysis, and facial video cloning for understanding human dyadic communication.

The video I took is available at http://robotics.csie.ntu.edu.tw/~bobwang/iros2008/. As I selected a wrong/low resolution to record this one-hour talk, it is hard to see the slides. Fortunately, the audio is clear. Take a look (or listen to this excellent talk)!


Tuesday, October 14, 2008

CMU RI Thesis Proposal: Pretty Observable Markov Decision Processes: Exploiting Approximate Structure for Efficient Planning under Uncertainty

Title: Pretty Observable Markov Decision Processes: Exploiting Approximate Structure for Efficient Planning under Uncertainty

Nicholas Armstrong-Crews
Robotics Institute
Carnegie Mellon University

NSH 1507

10:00 AM
20 Oct 2008

Planning under uncertainty is a challenging task. POMDP models have become a popular method for describing such domains. Unfortunately, solving a POMDP to find the optimal policy is computationally intractable, in general. Recent advances in solving POMDPs include finding near-optimal policies and exploiting structured representations of the problems. We believe that using these two tools together synergistically, we can tame the complexity of many POMDPs. In this thesis, we propose to further advance these approaches by analyzing new types of structure and new approximation techniques, as well as the methods combining the intersection of the two.

Some of the research we have done to lay the groundwork for this thesis falls into these categories, with promising results. We introduced the Oracular POMDP framework, which takes advantage of an MDP-like structure by allowing direct observation of the state as a (costly) action by the agent, but otherwise the agent receives no information from the environment and in between invocations of this “oracle'' action the agent is again afflicted by uncertainty. We have given an anytime algorithm for solving Oracular POMDPs which we've proven is efficient (poly-time) in all but the number of actions. At any iteration of the anytime algorithm, we have a (provably) near-optimal policy, which we have achieved efficiently by exploiting the structured observation function.

Another vein of our past work addressing solving general POMDPs by approximating them as finite state MDPs. It is a well-known result that POMDPs are equivalent to continuous MDPs whose state space is the belief simplex (the probability distribution over possible hidden states). We sample a finite number of these beliefs to create a finite-state MDP that approximates the original POMDP. We then solve this MDP for an optimal policy, improve our sample of belief states with this policy so that it better approximates the POMDP, and continue in this fashion.

These prior works exhibit an important common methodology: anytime algorithms that give near-optimal policies at every iteration, and in the limit converge to the optimal policy. This property is paramount for tackling problems with approximate structure. We can focus early iterations on the structured portion of the problem, which we can solve quickly; and later iterations can handle the complex, unstructured portion of the problem. In this way, we can quickly reach a near-optimal solution, while guaranteeing convergence to an optimal solution in the limit. Our method of evaluating an algorithm's performance on a given problem, then, is the entire curve of policy quality versus algorithm runtime.

Although the AI literature is rich with attempts to exploit different types of structure, in this thesis we focus on a small subset. Our prior work includes Oracular POMDPs, an extremely structured observation function; and the finite-state MDP approximation to POMDPs, which takes advantage of a structured belief-state space that is learned as the algorithm progresses.

For the remainder of the thesis work, we propose to generalize the concept of Oracular POMDPs to include nearly perfect information from oracles, with nearly no information provided from the environment otherwise; we will also extend the oracle concept to factored state problems, where an oracle can reveal one state variable reliably but not the others. We will investigate automated techniques for learning structure from a given unstructured representation. Finally, we wish to examine in greater detail what can be proven about the optimality-runtime tradeoff of these approximately structured POMDPs.

To evaluate our methods, we will apply them to several types of problems. First, we will introduce new synthetic domains that exhibit the structure we wish to exploit. Second, we will use our structure learning methods on existing domains in the literature. Finally, we will attempt to apply the methods to a real-world robot problem, in order to address doubts (in the minds of the community and of the author) about the usefulness of POMDP methods real robots.

Full text

Sunday, October 12, 2008

Lab Meeting October 13, 2008(Tiffany): Graph Laplacian Based Transfer Learning in Reinforce-ment Learning

Yi-Ting Tsao, Ke-Ting Xiao, Von-Wun Soo

The aim of transfer learning is to accelerate learning in related domains. In reinforcement learning, many different features such as a value function and a policy can be transferred from a source domain to a related target domain. Many researches focused on transfer using hand-coded translation functions that are designed by the experts a priori. However, it is not only very costly but also problem dependent. We propose to apply the Graph Laplacian that is based on the spectral graph theory to decompose the value functions of both a source domain and a target domain into a sum of basis functions respectively. The transfer learning can be carried out by transferring weights on the basis functions of a source do-main to a target domain. We investigate two types of domain transfer, scaling and topological. The results demonstrated that the transferred policy is a better prior policy to reduce the learning time.


Saturday, October 11, 2008

Lab Meeting October 13 (Andi) Extrinsic Laser Scanner / Camera calibration

I will summarize three papers with different approaches for Laser/Camera calibration.

[1] An Algorithm for Extrinsic Parameters Calibration of a Camera and a Laser Range Finder Using Line Features
[2] An efficient extrinsic calibration of a multiple laser scanners and cameras’ sensor system on a mobile platform
[3] Extrinsic calibration of a camera and laser range finder (improves camera calibration)

[1]This paper presents an effective algorithm for calibrating the extrinsic parameters between a camera and a laser range finder whose trace is invisible. On the basis of an analysis of three possible features, we propose to design a right-angled triangular checkerboard and to employ the invisible intersection points of the laser range finder’s slice plane with the edges of the checkerboard to set up the constraints equations. The extrinsic parameters are then calibrated by minimizing the algebraic errors between the measured intersections points and their corresponding projections on the image plane of the camera....
[2] ...In this research, we present a practical method for extrinsic calibration of multiple laser scanners and video cameras that are mounted on a vehicle platform. Refering to a fiducial coordinate system on vehicle platform, a constraint between the data of a laser scanner and of a video camera is established. It is solved in an iterative way to find a best solution from the laser scanner and from the video camera to the fiducial coordinate system. On the other hand, all laser scanners and video cameras are calibrated for each laser scanner and video camera pair that has common in feature points in a sequential way....
[3] We describe theoretical and experimental results for the extrinsic calibration of sensor platform consisting of a camera and a 2D laser range finder. The calibration is based on observing a planar checkerboard pattern and solving for constraints between the “views” of a planar checkerboard calibration pattern from a camera and laser range finder. We give a direct solution that minimizes an algebraic error from this constraint, and subsequent nonlinear refinement minimizes a re-projection error....

Thursday, October 09, 2008

IEEE News: Smart Phones May Detect Sleep Disorders

Technology for screening and diagnosing sleep disorders, and for waking users at the best times in their sleep cycles, has been developed by researchers at two Finnish universities, Tampere University of Technology and the University of Helsinki, who say the first application of the new technology, a smart alarm clock for mobile phones, HappyWakeUp, is now available. The researchers first noticed that a common microphone is very sensitive to any sounds and voices produced by movements in the bed during night-time, they say, and can adapt that technology for the detection of restless sleep, leg movements associated with restless leg syndrome and screening for snoring and sleep apnea. The technology makes it possible to perform several repeated all-night recordings and to diagnose sleep disorders in countries and areas with no previous sleep recording facilities, according to researchers, who say the new technology is extremely cost-efficient, compared to the use of existing s! pecial medical recording devices. Read more
Learn more about broadband wireless in the IEEE Xplore® digital library

Research Scientist Position in Robotics at MIT CSAIL

Research Scientist
Computer Science and Artificial Intelligence Laboratory
Massachusetts Institute of Technology

RESEARCH SCIENTIST, Computer Science and Artificial Intelligence Laboratory (CSAIL), to perform research in the development of perception, planning, control, and human interface software and algorithms for autonomous robots; manage research in autonomous vehicles, including development and testing of techniques for vision, lidar, and radar data processing for mapping, localization, and autonomous path control; and development and field deployment of novel robotic systems for land, air, and sea environments.

REQUIREMENTS: a Ph.D. in robotics or computer vision; and five or more years' experience in perception algorithm and human-computer interface and robotic system programming for autonomous vehicles. Seek motivated, enthusiastic roboticist who demonstrates exceptional programming skills and the ability to perform independent research and manage complex research projects. Must be able to help mentor graduate students and postdocs. Position requires expert knowledge of Bayesian state estimation and computer vision algorithms such as Kalman filters, particle filters, and SIFT feature detection; and general experience in robot system integration, C/C++ network programming in Linux and Windows, CVS, SVN, openGL, perl, and HTML. Must have experience with configuration and management of Linux computer systems using Ubuntu/Debian distributions; deployment and operation of mobile ad-hoc wireless networks; and code development for public-domain robot control software packages such as CARMEN and LCM. Should also have experience creating real-time interfaces to vision, laser, and radar sensors using serial, USB, CANbus, and tcp/ip connections; and in the configuration and operation of SICK laser range scanners.

Applicants may apply online at http://hrweb.mit.edu/staffing/
(Search for position mit-00005935)

John Leonard (jleonard@mit.edu)