Robot Perception and Learning: November 2009

Wednesday, November 25, 2009

CMU talk: Unsupervised Detection of Regions of Interest Using Iterative Link Analysis

CMU VASC Seminar
Monday, November 30, 2009

Unsupervised Detection of Regions of Interest Using Iterative Link Analysis
Gunhee Kim
Ph.D. Student, Computer Science Department

Abstract:

This work is a joint project with Antonio Torralba during my visit to MIT and will be presented as a poster at the upcoming NIPS 2009 Conference.

This talk will discuss a fast and scalable alternating optimization technique to detect regions of interest (ROIs) in cluttered Web images without labels. The proposed approach discovers highly probable regions of object instances by iteratively repeating the following two functions: (1) choose the exemplar set (i.e. a small number of highly ranked reference ROIs) across the dataset and (2) refine the ROIs of each image with respect to the exemplar set. These two subproblems are formulated as ranking in two different similarity networks of ROI hypotheses by link analysis. The experiments with the PASCAL 06 dataset show that our unsupervised localization performance is better than one of the state-of-the-art techniques and comparable to supervised methods. Also, we test the scalability of our approach with five objects in a Flickr dataset consisting of more than 200K images.

Bio: Gunhee Kim is a Ph.D. student in CMU's Computer Science Department advised by Takeo Kanade. He received his master's degree under the supervision of Martial Hebert in 2008 from the Robotics Institute at CMU. His research interests are computer vision, machine learning, data mining, and biomedical imaging.

Monday, November 23, 2009

Lab Meeting November 25, 2009 (Alan): Navigating, Recognizing and Describing Urban Spaces With Vision and Lasers (IJRR 2009)

Title: Navigating, Recognizing and Describing Urban Spaces With Vision and Lasers (IJRR 2009)

Authors: Paul Newman, Gabe Sibley, Mike Smith, Mark Cummins, Alastair Harrison, Chris Mei, Ingmar Posner, Robbie Shade, Derik Schroeter, Liz Murphy, Winston Churchill, Dave Cole, Ian Reid

Abstract:

In this paper we describe a body of work aimed at extending the reach of mobile navigation and mapping. We describe how running topological and metric mapping and pose estimation processes concurrently, using vision and laser ranging, has produced a full six degree-of-freedom outdoor navigation system. It is capable of pro-ducing intricate three-dimensional maps over many kilometers and in real time. We consider issues concerning the intrinsic quality of the built maps and describe our progress towards adding semantic labels to maps via scene de-construction and labeling. We show how our choices of representation, inference methods and use of both topological and metric techniques naturally allow us to fuse maps built from multiple sessions with no need for manual frame alignment or data association.

Link: http://ijr.sagepub.com/cgi/content/abstract/28/11-12/1406?etoc

Sunday, November 22, 2009

CMU PhD Thesis Proposal: Learning Methods for Thought Recognition

CMU RI PhD Thesis Proposal

Mark Palatucci
Learning Methods for Thought Recognition
November 18, 2009, 3:00 p.m., NSH 3305

Abstract
This thesis proposal considers the problem of training machine learning classifiers in domains where data are very high dimensional and training examples are extremely limited or impossible to collect for all classes of interest. As a case study, we focus on the application of thought recognition, where the objective is to classify a person’s cognitive state from a recorded image of that person’s neural activity. Machine learning and pattern recognition methods have already made a large impact on this field, but most prior work has focused on classification studies with small numbers of classes and moderate amounts of training data. In this thesis, we focus on thought recognition in a limited data setting, where there are few, if any, training examples for the classes we wish to discriminate, and the number of possible classes can be in the thousands.

Despite these constraints, this thesis seeks to demonstrate that it is possible to classify noisy, high dimensional data with extremely few training examples by using spatial and temporal domain knowledge, intelligent feature selection, semantic side information, and large quantities of unlabeled data from related tasks.

In our preliminary work, we showed that it possible that build a binary classifier that can accurately classify between cognitive states with more than 80,000 features, and only two training examples per class. We also showed how classification can be improved using principled feature selection, and derived a significance test using order statistics that is appropriate for very high-dimensional problems with small numbers of training examples.

We have also explored the most extreme case of limited data, the zero-shot learning setting, where we do not have any training examples for classes we wish to discriminate. We showed that by using a knowledge base of semantic side information to create intermediate features, we can build a classifier that can classify words that people are thinking about, even without training data for those words while the classifier is forced to choose between nearly 1,000 different candidate words.

Finally, we showed how multi-task learning can be used to learn useful semantic features directly from data. We formulated the semantic feature learning problem as a Multi-task Lasso and presented an extremely fast and highly scalable algorithm for solving the resulting optimization.

We propose work to extend our zero-shot learning setting by optimizing semantic feature sets and by using an active learning framework to choose the most informative training examples. We also propose to use latent feature models such as components analysis and sparse coding in a self-taught learning framework to improve decoding by leveraging data from additional neural imaging experiments.

[PDF]

Thesis Committee
Tom Mitchell, Chair
Dean Pomerleau
J. Andrew Bagnell
Andrew Ng, Stanford University

Saturday, November 21, 2009

CMU talk: Imitation Learning and Purposeful Prediction

Machine Learning Lunch (http://www.cs.cmu.edu/~learning/)
Speaker: Prof. Drew Bagnell
Date: Monday, November 23, 2009

Imitation Learning and Purposeful Prediction

Programming robots is hard. While demonstrating a desired behavior may be easy, designing a system that behaves this way is often difficult, time consuming, and ultimately expensive. Machine learning promises to enable "programming by demonstration" for developing high-performance robotic systems. Unfortunately, many approaches that utilize the classical tools of supervised learning fail to meet the needs of imitation learning. Perhaps foremost, classical statistics and supervised machine learning exist in a vacuum: predictions made by these algorithms are explicitly assumed to not affect the world in which they operate. I'll discuss the problems that result from ignoring the effect of actions influencing the world, and I'll highlight simple "reduction-based" approaches that, both in theory and in practice, mitigate these problems.

Additionally, robotic systems are often built atop sophisticated planning algorithms that efficiently reason far into the future; consequently, ignoring these planning algorithms in lieu of a supervised learning approach often leads to poor and myopic performance. While planners have demonstrated dramatic success in applications ranging from legged locomotion to outdoor unstructured navigation, such algorithms rely on fully specified cost functions that map sensor readings and environment models to a scalar cost. Such cost functions are usually manually designed and programmed. Recently, our group has developed a set of techniques that learn these functions from human demonstration by applying an /Inverse Optimal Control/ (IOC) approach to find a cost function for which planned behavior mimics an expert's demonstration. These approaches shed new light on the intimate connections between probabilistic inference and optimal control. I'll consider case studies in activity forecasting of drivers and pedestrians as well as the imitation learning of robotic locomotion and rough-terrain navigation. These case-studies highlight key challenges in applying the algorithms in practical settings.

Friday, November 20, 2009

Lab Meeting November 25, 2009 (KuoHuei): You’ll NeverWalk Alone: Modeling Social Behavior for Multi-target Tracking (ICCV 2009)

Title: You’ll NeverWalk Alone: Modeling Social Behavior for Multi-target Tracking

The Twelfth IEEE International Conference on Computer Vision (ICCV 2009)

Authors: S. Pellegrini, A. Ess, K. Schindler, and L. van Gool

Abstract:

Object tracking typically relies on a dynamic model to predict the object’s location from its past trajectory. In crowded scenarios a strong dynamic model is particularly important, because more accurate predictions allow for smaller search regions, which greatly simplifies data association.

Traditional dynamic models predict the location for each target solely based on its own history, without taking into account the remaining scene objects. Collisions are resolved only when they happen. Such an approach ignores important aspects of human behavior: people are driven by their future destination, take into account their environment, anticipate collisions, and adjust their trajectories at an early stage in order to avoid them. In this work, we introduce a model of dynamic social behavior, inspired by models developed for crowd simulation. The model is trained with videos recorded from birds-eye view at busy locations, and applied as a motion model for multi-people tracking from a vehicle-mounted camera. Experiments on real sequences show that accounting for social interactions and scene knowledge improves tracking performance, especially during occlusions.

[link]

Wednesday, November 11, 2009

Lab Meeting November 11, 2009 (swem): Avoiding Negative Depth in Inverse Depth Bearing-Only SLAM (IROS 2008)

Title: Avoiding Negative Depth in Inverse Depth Bearing-Only SLAM

(2008 IEEE/RSJ International Conference on Intelligent Robots and Systems)

Author: Martin P. Parsley and Simon J. Julier

Abstract:

In this paper we consider ways to alleviate negative estimated depth for the inverse depth parameterisation of bearing-only SLAM. This problem, which can arise even if the beacons are far from the platform, can cause catastrophic failure of the filter.We consider three strategies to overcome this difficulty: applying inequality constraints, the use of truncated second order filters, and a reparameterisation using the negative logarithm of depth. We show that both a simple inequality method and the use of truncated second order filters are succesful. However, the most robust peformance is achieved using the negative log parameterisation.

Link: http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=4651118

Tuesday, November 10, 2009

ICCV'09 Oral Paper: You’ll NeverWalk Alone: Modeling Social Behavior for Multi-target Tracking

You’ll NeverWalk Alone: Modeling Social Behavior for Multi-target Tracking

S. Pellegrini, A. Ess, K. Schindler and L. van Gool
ICCV 2009 (oral)

Abstract:
Object tracking typically relies on a dynamic model to predict the object’s location from its past trajectory. In crowded scenarios a strong dynamic model is particularly important, because more accurate predictions allow for smaller search regions, which greatly simplifies data association. Traditional dynamic models predict the location for each target solely based on its own history, without taking into account the remaining scene objects. Collisions are resolved only when they happen. Such an approach ignores important aspects of human behavior: people are driven by their future destination, take into account their environment, anticipate collisions, and adjust their trajectories at an early stage in order to avoid them. In this work, we introduce a model of dynamic social behavior, inspired by models developed for crowd simulation. The model is trained with videos recorded from birds-eye view at busy locations, and applied as a motion model for multi-people tracking from a vehicle-mounted camera. Experiments on real sequences show that accounting for social interactions and scene knowledge improves tracking performance, especially during occlusions. [PDF]

Saturday, November 07, 2009

Lab Meeting November 11, 2009(Jim Yu): Planning-based Prediction for Pedestrians

Title: Planning-based Prediction for Pedestrians
Author: B. D. Ziebart, N. Ratliff, G. Gallagher, C. Mertz, K. Peterson, J. A. Bagnell, M. Hebert, A K. Dey, S. Srinivasa
International Conference on Intelligent Robots and Systems (IROS 2009)

We present a novel approach for determining robot movements that efficiently accomplish the robot’s tasks while not hindering the movements of people within the environment. Our approach models the goal-directed trajectories of pedestrians using maximum entropy inverse optimal control. The advantage of this modeling approach is the generality of its learned cost function to changes in the environment and to entirely different environments. We employ the predictions of this model of pedestrian trajectories in a novel incremental planner and quantitatively show the improvement in hindrance sensitive robot trajectory planning provided by our approach.

Link

Friday, November 06, 2009

(PAMI2009)Head Pose Estimation in Computer Vision: A Survey

Authors:
Murphy-Chutorian, E.; Trivedi, M.M.;

Abstract:
The capacity to estimate the head pose of another person is a common human ability that presents a unique challenge for computer vision systems. Compared to face detection and recognition, which have been the primary foci of face-related vision research, identity-invariant head pose estimation has fewer rigorously evaluated systems or generic solutions. In this paper, we discuss the inherent difficulties in head pose estimation and present an organized survey describing the evolution of the field. Our discussion focuses on the advantages and disadvantages of each approach and spans 90 of the most innovative and characteristic papers that have been published on this topic. We compare these systems by focusing on their ability to estimate coarse and fine head pose, highlighting approaches that are well suited for unconstrained environments.

Wednesday, November 04, 2009

NTU talk: Video Analysis in Vision-Based Intelligent Systems

Title: Video Analysis in Vision-Based Intelligent Systems
Speaker: Prof. Hsu-Yung Cheng, National Central University
Time: 2:20pm, Nov 6 (Fri), 2009
Place: Room 103, CSIE building

Abstract: Computer vision and video analysis techniques play an important role in modern intelligent systems. Video-based systems can capture a larger variety of desired information and are relatively inexpensive because cameras are easy to install, operate, and maintain. With the huge amount of video cameras installed everywhere nowadays, there is an urgent need for automated video understanding techniques that can replace human operators to monitor the areas under surveillance. In this talk I will breifly introduce several topics and related techniques in intelligent surveillance applications. More discussions will be given on the topic of video object tracking. I will introduce a work on video object tracking which combines the advantages of both the flexibility of particle sampling and mathematical tractability of Kalman filters. Also, for objects that cannot be separated during the tracking proces, possible solutions are discussed.

Short Biography: Hsu-Yung Cheng received the Bachelor’s degree in computer science and information engineering from National Chiao-Tung University in Taiwan in 2000 and the Master’s degree from the same department in 2002. She earned a degree of Doctor of Philosophy from the University of Washington in Electrical Engineering in 2008. Hsu-Yung Cheng joined the Department of Computer Science and Information Engineering in National Central University in 2008 as an assistant professor. Her research interest includes image and video analysis and intelligent systems.

CMU talk: Challenges in the Practical Application of Machine Learning

Intelligence Seminar

November 10, 2009
3:30 pm

Challenges in the Practical Application of Machine Learning
Carla E. Brodley, Tufts University

Abstract:
In this talk I will discuss the factors that impact the successful application of supervised machine learning. Driven by several interdisciplinary collaborations, we are addressing the problem of what to do when your initial accuracy is lower than is acceptable to your domain experts. Low accuracy can be due to three factors: noise in the class labels, insufficient training data, and whether the features describing each training example are able to discriminate the classes. In this talk, I will discuss research efforts at Tufts addressing the second two factors. The first project, introduces a new problem which we have named active class selection (ACS). ACS arises when one can ask the question: given the ability to collect n additional training instances, how should they be distributed with respect to class? The second project examines how one might assess that the class distinctions are not supported by the features and how constraint-based clustering can be used to uncover the true class structure of the data. These two issues and their solutions will be explored in the context of three applications. The first is to create a map of global map of the land cover of the Earth's surface from remotely sensed data (satellite data). The second is to build a classifier based on data collected from an "artificial nose" to discriminate vapors. The "nose" is a collection of sensors that have different reactions to different vapors. The third is to classify HRCT images of the lung.

Bio:
Carla E. Brodley is a professor in the Department of Computer Science at Tufts University. She received her PhD in computer science from the University of Massachusetts, at Amherst in 1994. From 1994-2004, she was on the faculty of the School of Electrical Engineering at Purdue University. Professor Brodley's research interests include machine learning, knowledge discovery in databases, and computer security. She has worked in the areas of anomaly detection, active learning, classifier formation, unsupervised learning, and applications of machine learning to remote sensing, computer security, digital libraries, astrophysics, content-based image retrieval of medical images, computational biology, saliva diagnostics, evidence-based medicine and chemistry. She was a member of the DSSG in 2004-2005. In 2001 she served as program co-chair for the International Conference on Machine Learning (ICML) and in 2004, she served as the general chair for ICML. Currently she is an associate editor of JMLR and Machine Learning, and she is on the editorial board of DKMD. She is a member of the AAAI Council and is co-chair of the Computing Research Association's Committee on the Status of Women in Computing Research (CRA-W).

Tuesday, November 03, 2009

Lab Meeting November 4, 2009(Chung-Han) : An Active Learning Approach for Segmenting Human Activity Datasets

Title: An Active Learning Approach for Segmenting Human Acticitiy Datasets
Author: Liyue Zhao, Gita Sukthankar
In: MM '09: Proceedings of the seventeen ACM international conference on Multimedia

Abtract:
Human activity datasets collected under natural conditions are an important source of data. Since these contain multiple activities in unscripted sequence, temporal segmentation of multimodal datasets is an important precursor to recognition and analysis. Manual segmentation is prohibitively time consuming and unsupervised approaches for segmentation are unreliable since they fail to exploit the semantic context of the data. Gathering labels for supervised learning places a large workload on the human user since it is relatively easy to gather a mass of unlabeled data but expensive to annotate. This paper proposes an active learning approach for segmenting large motion capture datasets with both small training sets and working sets. Support Vector Machines (SVMs) are learned using an active learning paradigm; after the classifiers are initialized with a small set of labeled data, the users are iteratively queried for labels as needed. We propose a novel method for initializing the classifiers, based on unsupervised segmentation and clustering of the dataset. By identifying and training the SVM with points from pure clusters, we can improve upon a random sampling strategy for creating the query set. Our active learning approach improves upon the initial unsupervised segmentation used to initialize the classifier, while requiring substantially less data than a fully supervised method; the resulting segmentation is comparable to the latter while requiring significantly less effort from the user.

[Full Text]

Monday, November 02, 2009

Lab Meeting November 4, 2009(Jimmy): Learning To Detect Unseen Object Classes by Between-Class Attribute Transfer

Title: Learning To Detect Unseen Object Classes by Between-Class Attribute Transfer
Authors: Christoph H. Lampert, Hannes Nickisch, and Stefan Harmeling
In: CVPR2009

Abstract
We study the problem of object classification when training and test classes are disjoint, i.e. no training examples of the target classes are available. This setup has hardly been studied in computer vision research, but it is the rule rather than the exception, because the world contains tens of thousands of different object classes and for only a very few of them image, collections have been formed and annotated with suitable class labels.

In this paper, we tackle the problem by introducing attribute-based classification. It performs object detection based on a human-specified high-level description of the target objects instead of training images. The description consists of arbitrary semantic attributes, like shape, color or even geographic information. Because such properties transcend the specific learning task at hand, they can be pre-learned, e.g. from image datasets unrelated to the current task. Afterwards, new classes can be detected based on their attribute representation, without the need for a new training phase. In order to evaluate our method and to facilitate research in this area, we have assembled a new largescale dataset, “Animals with Attributes”, of over 30,000 animal images that match the 50 classes in Osherson’s classic table of how strongly humans associate 85 semantic attributes with animal classes. Our experiments show that by using an attribute layer it is indeed possible to build a learning object detection system that does not require any training images of the target classes.

[link]

I will also try to introduce the NIPS2009 paper Zero-Shot Learning with Semantic Output Codes by M. Palatucci, D. Pomerleau, G. Hinton, and T.M. Mitchell, which gives some formalization to the problem.