Monday, March 30, 2009
3-D Stereo Vision SLAM for Autonomous Mapping and Navigation (live demonstration)
Project Manager and Senior Technical Staff Member
San Diego, CA
Thursday, April 2nd, 2009
Vision Robotics has developed a 3-D stereo vision SLAM, visual odometry, object identification, and tracking system for autonomous robots. The current applications include an autonomous canister style vacuum cleaner and a sensor system for military UGV's. The presentation will describe the 3-D stereo vision SLAM algorithms including the stereo vision processing, particle based localization, and 3-D map generation. It will conclude with a demonstration of the 3-D stereo vision SLAM sensor system on a prototype Autonomous Mapping Vehicle (AMV) being developed for SPAWAR System Center San Diego. The current prototype uses a Segway RMP-50 as a base platform, has four stereo camera pairs, and maps at over 100 square feet per minute with speeds over 2 mph. An external link via 802.11N to a laptop provides for start/stop/pause control and real-time display of the map and pose data. The AMV will autonomously move throughout an area, and when it has finished exploring the complete area, it will return to the starting location.
Pearse received the B.S. in electrical engineering from Massachusetts Institute of Technology in 1989 and the M.S. in electrical engineering from the University of California, San Diego in 1994. Pearse Ffrench is a project manager and senior technical staff member at Vision Robotics. He has been developing 3-D stereo vision SLAM and tracking algorithms at Vision Robotics for the past eight years. Previously, he was the founder and CTO of Digital Transport Systems, a communications test equipment company which was acquired by Wavetek Wandel Golterman in 1998. From 1989 to 1994, he was the lead engineer for Advanced Processing Laboratories on several Navy projects involving underwater acoustics and signal processing.
Sunday, March 29, 2009
Authors: Adam Coates, Pieter Abbeel, Andrew Y. Ng
International Conference on Machine Learning(ICML), Best Application Paper Award
We consider the problem of learning to follow a desired trajectory when given a small num-
ber of demonstrations from a sub-optimal expert. We present an algorithm that (i) extracts the—initially unknown—desired trajectory from the sub-optimal expert’s demonstrations and (ii) learns a local model suitable for control along the learned trajectory.
We apply our algorithm to the problem of autonomous helicopter flight. In all cases, the autonomous helicopter’s performance exceeds that of our expert helicopter pilot’s demonstrations. Even stronger, our results significantly extend the state-of-the-art in autonomous helicopter aerobatics. In particular, our results include the first autonomous tic-tocs, loops and hurricane, vastly superior performance on previously performed aerobatic maneuvers (such as in-place flips and rolls), and a complete airshow, which requires autonomous transitions between these and various other maneuvers.
Speaker: Jonathan Chang (CS@Princeton)
Date: Monday, March 30, 2009
Uncovering, understanding, and predicting links
Linked data are ubiquitous. Friends are connected to friends in social networks, genes interact with genes in biological networks, and papers cite other papers in citation networks. Uncovering, understanding, and making predictions using such links is of critical importance in unlocking and applying this data.
In this talk I will present probabilistic models for these tasks. Our approaches leverage the machinery of topic models, such as latent Dirichlet allocation (LDA), which have been successfully employed for a variety of applications. However, topic models typically make naive assumptions about the independence of documents. By breaking these naive assumptions, we can develop novel models that give insights into the latent structure of networks of documents. By using efficient variational inference, I will show that these models can make accurate predictions and extract new link information from free text.
Saturday, March 28, 2009
Monday, March 30, 2009
Observing and Interpreting Everyday Human Activities
Jan Bandouch, Ph.D. Student
Moritz Tenorth, Ph.D. Student
TU Munich, Germany
We will present and discuss an integrated approach for the automated perception, interpretation and analysis of human activities of daily life, with an emphasis on everyday manipulation tasks. During the first half of this talk, we will describe a markerless motion capture system that we use to acquire joint-angle representations of human full-body motions at high accuracy. The system is capable of tracking arbitrary, previously unobserved motions using a sophisticated hierarchical sampling strategy for recursive Bayesian estimation that combines partitioning with annealing strategies to enable efficient search in the presence of many local maxima. A simple yet effective appearance model is used to implicitly deal with occlusions and to reduce the influence of objects and dynamic parts of the environment. We will then show in the second half of our talk how to infer probabilistic, hybrid (continuous/discrete), low-dimensional, and hierarchical models of the observed activity. These models can be used to answer queries about the observed activity such as the following ones: Which activity was performed? Which hand trajectory did the human use for taking a plate out of the overhead cupboard? What was different compared to the normal execution of this activity? The integration of these activity models into a knowledge-based framework allows for the association of the observed activity with encyclopedic, commonsense, and naive physics knowledge and for querying the system in an abstract, symbolic way.
Jan Bandouch is a PhD student in the Intelligent Autonomous Systems Group of Prof. Michael Beetz at the TU Munich, Germany. His current research interest is in Computer Vision and Robotics, where he is working on techniques for markerless human motion capture in typical human living environments. The intended areas of application are activity and intention recognition in smart environments, motion analysis and ergonomic studies. He obtained his Diploma (equivalent to M.Sc.) in Computer Science from the TU Munich in 2005. Homepage: http://www9.cs.tum.edu/people/bandouch
Moritz Tenorth is a PhD student in the Intelligent Autonomous Systems Group of Prof. Michael Beetz at the TU Munich, Germany. His research interests include grounded knowledge representations which integrate information from web sources, observed sensor data and data mining techniques, and their applications to knowledge-based action interpretation and robot control. He studied Electical Engineering in Aachen and Paris and obtained his Diploma degree (equivalent to M.Eng.) in 2007 from the RWTH Aachen. Homepage: http://www9.cs.tum.edu/people/tenorth
Title: Human Pose and Motion, Challenges and Physics-based Models
Leonid Sigal, University of Toronto
Date: Tuesday 3/31
Recovery and analysis of human pose and motion from video is the key enabling technology for a broad spectrum of applications, in and outside of computer science; including applications in HCI, biometrics, biomechanics and computer graphics. Despite years of research, the general problem of tracking a person in an unconstrained environment, particularly from monocular observations, remains challenging. In this talk I will describe the basic building blocks and challenges of the articulated human pose estimation and tracking, as well as my contributions to the various aspects of this problem and the field in recent years. I will particularly focus on the new and unique class of models that incorporate physic-based predictions and simulation into the inference process. Physics plays an important and intricate role in characterizing, describing and predicting human motion. The key benefit of using physics-based models or priors for tracking is the improved realism in the recovered motions, as well as enhanced ability to deal with weak image observations and diverse environmental interactions. Newtonian physics, in these models, approximates the rigid-body dynamics of the body in the environment through the application and integration of forces. Since the motion of the body is intimately tied with the environment, the use of such models also allows one to start reasoning about the geometry and physical properties of the environment as a whole (e.g. orientation and compliance of ground). This work is part of joint projects with colleagues at Brown University and University of Toronto.
Leonid Sigal is a postdoctoral fellow in the Department of Computer Science at University of Toronto. He received his Ph.D. in computer science from Brown University (2007); his M.S. from Brown University (2003); his M.A. from Boston University (1999); and his B.Sc. degrees in Computer Science and Mathematics from Boston University (1999). From 1999 to 2001, he worked as a senior vision engineer at Cognex Corporation, where he developed industrial vision applications for pattern analysis and verification. In 2002, he spent a semester as a research intern at Siemens Corporate Research (SCR) working on autonomous obstacle detection and avoidance for vehicle navigation. During the summers of 2005 and 2006, he worked as a research intern at Intel Applications Research Lab (ARL) on human pose estimation and tracking. His work received the Best Paper Award at the Articulate Motion and Deformable Objects Conference in 2006 (with Prof. Michael J. Black). Dr. Sigal's research interests mainly lie in the areas of computer vision and machine learning, but also borderline fields of computer graphics, psychology and humanoid robotics. He is particularly interested in statistical models for problems of visual inference, including human motion recovery and analysis, graphical models, probabilistic and hierarchical inference.
Title: Empowering switching linear dynamic systems with higher-order temporal structure
Sangmin Oh, Georgia Institute of Technology
Date: Friday 3/27
Automated analysis of temporal data is a task of utmost importance for intelligent machines. For example, ubiquitous computing systems need to understand the intention of humans from the stream of sensory information, and health-care monitoring systems can assist patients and doctors by providing automatically annotated daily health reports. Moreover, a huge amount of multimedia data such as videos await to be analyzed and indexed for search purposes, while scientific data such as recordings of animal behavior and evolving brain signals are being collected in the hope to deliver a new scientific discovery about life.
In this talk, we will describe a class of newly developed time-series models in Bayesian network formulation. In particular, we will focus on the extensions of switching linear dynamic systems (SLDSs) with higher-order temporal structure and inference methods thereof. SLDSs have been used to model continuous multivariate temporal data under the assumption that the characteristics of complex temporal sequences can be captured by Markov switching between a set of simpler primitives which are linear dynamic systems (LDSs). In particular, we will focus on the extensions of SLDSs which are developed to address problems such as continuous labeling, robust labeling for data with systematic global variations, and hierarchical labeling.
First, we will present a data-driven MCMC inference method for SLDS model. The distinctive characteristic of this approach is that it turns heuristic labeling methods into data-driven proposal distributions of MCMC where the outcome results in a principled approximate inference method. In other words, it is a methodology to turn a novice into an expert. We show the resulting MCMC method for SLDSs where an inference problem is now solved which could not be addressed efficiently by Gibbs sampling previously.
Second, parametric SLDSs (P-SLDSs) explicitly model the global parameters which induce systematic temporal and spatial variations of data. The additional structure of PSLDSs allows us to conduct the global parameter quantification task which could not be addressed by standard SLDSs previously in addition to providing more accurate labeling ability.
Third, segmental SLDSs (S-SLDSs) provide the ability to capture descriptive duration models within LDS regimes. The encoded duration models are more descriptive than the exponential duration models induced within the standard SLDSs and allow us to avoid the severe problem of over-segmentations and demonstrate superior labeling accuracy.
Finally, we introduce hierarchical SLDSs (H-SLDSs), a generalization of standard SLDSs with hierarchic Markov chains. H-SLDSs are able to encode temporal data which exhibits hierarchic structure where the underlying low-level temporal patterns repeatedly appear in different higher level contexts. Accordingly, H-SLDSs can be used to analyze temporal data at multiple temporal granularities, and provide the additional ability to learn a more complex H-SLDS model easily by combining underlying H-SLDSs.
The developed SLDS extensions have been applied to two real-world problems. The first problem is to automatically analyze the honey bee dance dataset where the goal is to correctly segment the dance sequences into different regimes and parse the messages about the location of food sources embedded in the data. We show that a combination of the P-SLDS and S-SLDS models has demonstrated improved labeling accuracy and message parsing results. The second problem is to analyze the wearable exercise data where we aim to provide an automatically generated exercise record at multiple temporal resolutions.
Sangmin is a PhD candidate in Collge of Computing at Georgia Institute of Technology. He received his BS in computer science with cum laude from Seoul National Univ, in 2003. During his PhD thesis work, Sangmin has focussed on developing time-series models to address problems such as continuous labeling, robust labeling for data with systematic global variations, and hierarchical labeling, where he published his work at major conferences and journals in computer vision and AI. Additionally, he worked on problems in robotics, signal processing, and graphics, where he co-authored several academic publications. He was a recipient of Samsung Lee Kun Hee fellowship from '03 to '07. His research interests include computer vision, machine learning, robotics, computer graphics, data mining, time-series modeling and computational linguistics.
Friday, March 20, 2009
Thursday, March 19, 2009
Material Recognition By Humans and Machines
2:00pm, Friday, March 20th
We can easily tell if a spoon is made of stainless steel or plastic, if a shirt is clean or if food is fresh. These judgments of material appearance are ubiquitous. We use our material perception abilities to decide where to step on an icy sidewalk, which items to pick in the fresh produce aisle, and if a rash requires a trip to the doctor. In spite of the importance of these judgments, little is known about material recognition in the fields of human vision or computer vision.
We have studied human material judgments on real world photographs by asking observers questions like "Is that object made of paper or plastic?" or "Are those flowers fake or real?". We find that observers can recognize materials very well, even when images are presented very fast (40 millisecond/image). This performance was robust to low-level image degradations like removal of color, blurring and inversion of contrast polarity, suggesting that low-level information is not crucial for observers.
What do these results imply for machine vision systems? We evaluated the performance of simple classifiers based on low-level image features (e.g. jet-like features, SIFT) at the same material categorization task that humans did. We find that low-level features are not sufficient for categorizing materials on our data set suggesting a parallel with the results from human experiments. We conclude that there is rich territory to be explored both by computer vision and human vision researchers for this problem.
Lavanya Sharan is a final year graduate student working with Ted Adelson at MIT. Her research interests lie at the intersection of human vision and computer vision, especially in the domain of material recognition. She is interested in understanding how humans can recognize the materials that objects are made of and how to make computer vision systems that can do the same. Lavanya received her M.S. degree in Computer Science from MIT in 2005 and her undergraduate training in Electrical Engineering from IIT Delhi in 2003.
A Brown University team has developed a robot that can recognize human gestures in multiple environments, despite changes in lighting and depth. The team recorded a video as a proof of concept, demonstrating a robot equipped with continuous tracks, sensors, and a CSEM SwissRanger imaging camera. The robot recognizes its human controller by creating a silhouette that it focuses on as a human cutout, ignoring other environmental input. It follows humans about three feet behind, and moves in reverse when its controller walks towards it.
Wag The Robot? Robot Responds To Human Gestures
ScienceDaily (2009-03-12) -- Researchers have demonstrated how a robot can follow human gestures in a variety of environments -- indoors and outside -- without adjusting for lighting. The achievement is an important step forward in the quest to build fully autonomous robots as partners for human endeavors. ... > read full article
Wednesday, March 18, 2009
Thursday, March 19, 2009
Computational Study Of Nonverbal Social Communication
USC Institute for Creative Technologies
The goal of this emerging research field is to recognize, model and predict human nonverbal behavior in the context of interaction with virtual humans, robots and other human participants. At the core of this research field is the need for new computational models of human interaction emphasizing the multi-modal, multi-participant and multi-behavior aspects of human behavior. This multi-disciplinary research topic overlaps the fields of multi-modal interaction, social psychology, computer vision, machine learning and artificial intelligence, and has many applications in areas as diverse as medicine, robotics and education.
During my talk, I will focus on three novel approaches to achieve efficient and robust nonverbal behavior modeling and recognition: (1) a new visual tracking framework (GAVAM) with automatic initialization and bounded drift which acquires online the view-based appearance of the object, (2) the use of latent-state models in discriminative sequence classification (Latent-Dynamic CRF) to capture the influence of unobservable factors on nonverbal behavior and (3) the integration of contextual information (specifically dialogue context) to improve nonverbal prediction and recognition.
Dr. Louis-Philippe Morency is currently research professor at USC Institute for Creative Technologies where he leads the Nonverbal Behaviors Understanding project (ICT-NVREC). He received his Ph.D. from MIT Computer Science and Artificial Intelligence Laboratory in 2006. His main research interest is computational study of nonverbal social communication, a multi-disciplinary research topic that overlays the fields of multi-modal interaction, computer vision, machine learning, social psychology and artificial intelligence. He developed "Watson", a real-time library for nonverbal behavior recognition and which became the de-facto standard for adding perception to embodied agent interfaces. He received many awards for his work on nonverbal behavior computation including three best-paper awards in 2008 (at various IEEE and ACM conferences). He was recently selected by IEEE Intelligent Systems as one of the "Ten to Watch" for the future of AI research.
Tuesday, March 17, 2009
Authors: Peter Biber, Tom Duckett
The International Journal of Robotics Research, Vol. 28, No. 1, 20-33 (2009)
This paper presents a system for long-term SLAM (simultaneous localization and mapping) by mobile service robots and its experimental evaluation in a real dynamic environment. To deal with the stability-plasticity dilemma (the trade-off between adaptation to new patterns and preservation of old patterns), the environment is represented by multiple timescales simultaneously (five in our experiments). A sample-based representation is proposed, where older memories fade at different rates depending on the timescale and robust statistics are used to interpret the samples. The dynamics of this representation are analyzed in a five-week experiment, measuring the relative influence of short- and long-term memories over time and further demonstrating the robustness of the approach.
Link: local copy
David Silver, PhD Candidate
Carnegie Mellon University
Thursday, March 19th, 2009
Rough terrain autonomous navigation continues to pose a challenge to the robotics community. Robust navigation by a mobile robot depends not only on the individual performance of perception and planning systems, but on how well these systems are coupled. When traversing rough terrain, this coupling (in the form of a cost function) has a large impact on robot performance, necessitating a robust design. This talk presents the application of imitation learning to this task. Using expert examples of proper navigation behavior, mappings from both online and offline perceptual data to planning costs are learned. Experimental results are presented from the Crusher autonomous navigation platform, demonstrating a benefit to autonomous performance as well as a decrease in programmer interaction.
Friday, March 13, 2009
Robotics and Automation, 2007 IEEE International Conference on
10-14 April 2007
Recently it has been shown that an inverse depth
parametrization can improve the performance of real-time
monocular EKF SLAM, permitting undelayed initialization of
features at all depths. However, the inverse depth parametrization
requires the storage of 6 parameters in the state vector for
each map point. This implies a noticeable computing overhead
when compared with the standard 3 parameter XYZ Euclidean
encoding of a 3D point, since the computational complexity of
the EKF scales poorly with state vector size.
In this work we propose to restrict the inverse depth
parametrization only to cases where the standard Euclidean
encoding implies a departure from linearity in the measurement
equations. Every new map feature is still initialized using the
6 parameter inverse depth method. However, as the estimation
evolves, if according to a linearity index the alternative
XYZ coding can be considered linear, we show that feature
parametrization can be transformed from inverse depth to XYZ
for increased computational efficiency with little reduction in
We present a theoretical development of the necessary
linearity indices, along with simulations to analyze the influence
of the conversion threshold. Experiments performed with with a
30 frames per second real-time system are reported. An analysis
of the increase in the map size that can be successfully managed
CMU talk: Beyond Nouns and Verbs: Learning Visually Grounded Stories of Images and Videos using Language and Vision Abhinav Gupta
Monday, March 16, 2009
Beyond Nouns and Verbs: Learning Visually Grounded Stories of Images and Videos using Language and Vision
University of Maryland, College Park
In this talk, I will present our recent work on exploring synergy between language and vision for learning visually grounded contextual structures. Our work departs from the traditional view to visual and contextual learning where individual detectors and relationships are learned separately. Our work focuses on simultaneous learning of visual appearance and contextual models from richly annotated, weakly labeled datasets. In the first part of the talk, I will show how rich annotations can be utilized to constrain the learning of visually grounded models of nouns, prepositions and comparative adjectives from weakly labeled data. I will also show how visually grounded models of prepositions and comparative adjectives can be utilized as contextual models for scene analysis.
In the second part, I will present storyline models for interpretation of videos. Storyline models go beyond pair-wise contextual models and represent higher order constraints by allowing only a few and finite number of possible action sequences (stories). Visual inference using storyline models involve inferring the "plot" of the video (sequence of actions) and recognizing individual activities in the plot. I will also present an iterative approach to learn visually grounded storyline models from video and linguistic information provided in captions.
Abhinav Gupta is a doctoral candidate in the Department of Computer Science at University of Maryland, College Park. He received MS in Computer Science from University of Maryland in 2007 and BTech in Computer Science and Engineering from Indian Institute of Technology, Kanpur, in 2004. His research focuses on visually grounded semantic models, and how language and vision can be exploited to learn such models. His other research interests include combining multiple cues, probabilistic graphical models, human body tracking and camera networks. He is also a recipient of the University of Maryland Dean's Fellowship for excellence in research.
A surge of recent research in machine learning and statistics has developed new techniques for finding patterns of words in document collections using hierarchical probabilistic models. These models are called "topic models" because the discovered word patterns often reflect the underlying topics that permeate the documents. Topic models also naturally apply to data such as images and biological sequences.
In this talk I will review the basics of topic modeling, and discuss some recent extensions: supervised topic modeling and relational topic modeling. Supervised topic models allow us to use topics in a setting where we seek both exploratory and predictive power. Relational topic models---which are built on supervised topic models---consider documents interconnected in a graph. These models can be used to summarize a network of documents, predict links between them, and predict words within them.
Joint work with Jonathan Chang and Jon McAuliffe.
David Blei is an assistant professor in the Computer Science department at Princeton University. He received his Ph.D. in 2004 from U.C. Berkeley and was a postdoctoral researcher in the Department of Machine Learning at Carnegie Mellon University. His research interests include graphical models, approximate posterior inference, and nonparametric Bayesian statistics. He focuses on applications to information retrieval and natural language processing.
Thursday, March 12, 2009
Lab Meeting March 16th (fish60): Path planning in image space for autonomous robot navigation in unstructured environments
Volume 26, Issue 2 (February 2009)
Special Issue on LAGR Program, Part II
In this paper we present an image space technique for path planning in unknown unstructured outdoor environments. Our method differs from previous techniques in that we perform path search directly in image space - the native sensor space of the imaging sensor. Our image space planning techniques can potentially be used with many different kinds of sensor data, and we experimentally evaluate the use of stereo disparity and color information. We present an extension to the basic image space planning system called the cylindrical planner that simulates a 2 field of view with a cylindrically shaped occupancy grid. We believe that image space planning is well suited for use in the local subsystem of a hierarchical planner and implement a hybrid hierarchical planner that utilizes the cylindrical planner as a local planning subsystem and a two-dimensional Cartesian planner as the global planning subsystem.
Sunday, March 08, 2009
Authors: Simon Baker, Daniel Scharstein, JP Lewis, Stefan Roth, Michael Black, Richard Szeliski
Published in: ICCV 2007
The quantitative evaluation of optical flow algorithms by Barron et al. led to significant advances in the performance of optical flow methods. The challenges for optical flow today go beyond the datasets and evaluation methods proposed in that paper and center on problems associated with
nonrigid motion, real sensor noise, complex natural scenes, and motion discontinuities. Our goal is to establish a new set of benchmarks and evaluation methods for the next generation of optical flow algorithms. To that end, we contribute four types of data to test different aspects of optical flow algorithms: sequences with nonrigid motion where the ground-truth flow is determined by tracking hidden fluorescent texture; realistic synthetic sequences; high frame-rate video used to study interpolation error; and modified stereo sequences of static scenes. In addition to the average angular error used in Barron et al., we compute the absolute flow endpoint error, measures for frame interpolation error, improved statistics, and flow accuracy at motion boundaries and in textureless regions. We evaluate the performance of several well-known methods on this data to establish the current state of the art. Our database is freely available on
the web together with scripts for scoring and publication of the results at http://vision.middlebury.edu/flow/
Saturday, March 07, 2009
Authors: Patrick Pfaff, Rudolph Triebel and Wolfram Burgard
Monte Carlo Localization in Outdoor Terrains Using Multilevel Surface Maps
Authors: Rainer Kuemmerle, Patrick Pfaff, Rudolph Triebel and Wolfram Burgard
Abstract: We propose a novel combination of techniques for robustly estimating the position of a mobile robot in outdoor environments using range data. Our approach applies a particle filter to estimate the full six-dimensional state of the robot and utilizes multilevel surface maps, which, in contrast to standard elevation maps, allow the robot to represent vertical structures and multiple levels in the environment. We describe probabilistic motion and sensor models to calculate the proposal distribution and to evaluate the likelihood of observations. We furthermore describe an active localization approach that actively selects the sensor orientation of the two-dimensional laser range scanner to improve the localization results. To efficiently calculate the appropriate orientation, we apply a clustering operation on the particles and evaluate potential orientations on the basis of these clusters.
Experimental results obtained with a mobile robot in large-scale outdoor environments indicate that our approach yields robust and accurate position estimates. The experiments also demonstrate that multilevel surface maps lead to a significantly better localization performance than standard elevation maps. They additionally show that further accuracy is obtained from the active sensing approach.