Robot Perception and Learning: 2009

Monday, December 28, 2009

Lab Meeting December 30th, 2009 (Nicole) : Estimation of Sound Source Number and Directions under a Multi-source Environment (IROS 2009)

Title: Estimation of Sound Source Number and Directions under a Multi-source Environment (IROS 2009)

Authors: Jwu-Sheng Hu, Member IEEE, Chia-Hsing Yang, Student Member IEEE, and Cheng-Kang Wang

Abstract:

Sound source localization is an important featurein robot audition. This work proposes a sound source numberand directions estimation method by using the delayinformation of microphone array. An eigenstructure-basedgeneralized cross correlation method is proposed to estimatetime delay between microphones. Upon obtaining the time delayinformation, the sound source direction and velocity can beestimated by least square method. In multiple sound source case,the time delay combination among microphones is arrangedsuch that the estimated sound speed value falls within anacceptable range. By accumulating the estimation results ofsound source direction and using adaptive K-means++algorithm, the sound source number and directions can beestimated.

[link]

Sunday, December 27, 2009

Measuring the Accuracy of Distributed Algorithms on Multi-Robot Systems

Measuring the Accuracy of Distributed Algorithms on Multi-Robot Systems

James McLurkin (UW CSE postdoc, MIT)
October 9, 2008, 3:30 pm
EE-105

Abstract
Distributed algorithms running on multi-robot systems rely on ad-hoc networks to relay messages throughout the group. The propagation speed of these messages is large, but not infinite, and problems in algorithm execution can arise when the robot speed is a large fraction of the message propagation speed. This implies a robot speed limit, as any robot moving away from a message source faster than the message speed will never receive new information, and no algorithm can function properly on it. In this work, we focus on measuring the accuracy of multi-robot distributed algorithms. We define the Robot Speed Ratio (RSR) as the ratio of robot speed to message speed. We express it in a form that is platform-independent and captures the relationship between communications usage, robot mobility, and algorithm accuracy. We show that trade-offs between these key quantities can be balanced at design time. Finally, we present results from experiments with 50 robots that characterize the accuracy of preexisting distributed algorithms for network communication, navigation, boundary detection, and dynamic task assignment. In all cases, accuracy degrades as speed increases or communication bandwidth is reduced. In our experiments, a RSR of 0.005 allows good accuracy in all algorithms, a RSR of 0.02 allows reasonable accuracy in simple algorithms, and all algorithms tested are essentially useless at a RSR of 0.10 or higher.

[link]

Friday, December 25, 2009

Lab Meeting December 30th, 2009(Gary) : Pose Robust Face Tracking by Combining Active Appearance Models and Cylinder Head Models

Title: Pose Robust Face Tracking by Combining Active Appearance Models and Cylinder Head Models (IJCV 2008)

Author : Jaewon Sung , Takeo Kanade , Daijin Kim

Abstract:

The active appearance models (AAMs) provide
the detailed descriptive parameters that are useful for various
autonomous face analysis problems. However, they are
not suitable for robust face tracking across large pose variation
for the following reasons. First, they are suitable for
tracking the local movements of facial features within a limited
pose variation. Second, they use gradient-based optimization
techniques for model fitting and the fitting performance
is thus very sensitive to initial model parameters.
Third, when their fitting is failed, it is difficult to obtain
appropriate model parameters to re-initialize them. To alleviate
these problems, we propose to combine the active
appearance models and the cylinder head models (CHMs),
where the global head motion parameters obtained from the
CHMs are used as the cues of the AAM parameters for a
good fitting or re-initialization. The good AAM parameters
for robust face tracking are computed in the following manner.
First, we estimate the global motion parameters by the
CHM fitting algorithm. Second, we project the previously
fitted 2D shape points onto the 3D cylinder surface inversely
Third, we transform the inversely projected shape points by
the estimated global motion parameters. Fourth, we project
the transformed 3D points onto the input image and computed
the AAM parameters from them. Finally, we treat the
computed AAM parameters as the initial parameters for the
fitting. Experimental results showed that face tracking combining
AAMs and CHMs is more pose robust than that of
AAMs in terms of 170% higher tracking rate and the 115%
wider pose coverage.

link

Thursday, December 24, 2009

NTU talk: Human Action Recognition Using Bag of Video Words

Title: Human Action Recognition Using Bag of Video Words
Speaker: Dr. Mubarak Shah, Agere Chair Professor of Computer Science, University of Central Florida
Time: 4:00pm, Dec 24 (Thu), 2009
Place: Room 210, CSIE building

Abstract:

The traditional approach for video analysis involves detection of objects, followed by tracking of objects from frame to frame and finally analysis of tracks for human action recognition. However, in some videos of complex scenes it is not possible to reliably detect and track objects. Therefore, recently in computer vision there has been lots of interest in the bag of video words approach, which bypasses the object detection and tracking steps. In bag of video words approach an action is described by a distribution of spatiotemporal cuboids (3D interest points).

In this talk, first I will describe a method to automatically discover the optimal number of video words clusters by utilizing the Maximization of Mutual Information (MMI). Unlike the k-means algorithm which is typically used to cluster spatiotemporal cuboids into video words based on their appearance similarity, MMI clustering further groups the video-words, such that the semantically similar video-words, e.g. words corresponding to the same part of the body during an action, are grouped in the same cluster.

The above method for human action recognition uses only one kind of features, spatiotemporal cuboids. However, single feature based representation for human action is not sufficient to capture the imaging variations (view-point, illumination etc.) and attributes of individuals (size, age, gender etc.).

Next I will present a method which uses two types of features: i) a quantized vocabulary of local spatio-temporal (ST) volumes (or cuboids), and ii) a quantized vocabulary of spin-images. To optimally combine these features, we treat different features and videos as nodes in a graph, where weighted edges between the nodes represent the strength of the relationship between entities. The graph is then embedded into a k-dimensional space subject to the criteria that similar nodes have Euclidian coordinates which are closer to each other. This is achieved by converting this constraint into a minimization problem whose solution is the eigenvectors of the graph Laplacian matrix. This procedure is known as Fiedler Embedding.

Short Biography:
Dr. Mubarak Shah, Agere Chair Professor of Computer Science, is the founding director of the Computer Visions Lab at UCF. He is a co-author of three books (Motion-Based Recognition (1997), Video Registration (2003), and Automated Multi-Camera Surveillance: Algorithms and Practice (2008)), all by Springer. He has published ten book chapters, seventy five journal and one hundred seventy conference papers on topics related to visual surveillance, tracking, human activity and action recognition, object detection and categorization, shape from shading, geo registration, photo realistic synthesis, visual crowd analysis, bio medical imaging, etc.

Dr. Shah is a fellow of IEEE, IAPR and SPIE. In 2006, he was awarded a Pegasus Professor award, the highest award at UCF, given to a faculty member who has made a significant impact on the university, has made an extraordinary contribution to the university community, and has demonstrated excellence in teaching, research and service. He is a Distinguished ACM Speaker. He was an IEEE Distinguished Visitor speaker for 1997-2000, and received IEEE Outstanding Engineering Educator Award in 1997. He received the Harris Corporation's Engineering Achievement Award in 1999, the TOKTEN awards from UNDP in 1995, 1997, and 2000; Teaching Incentive Program awards in 1995 and 2003, Research Incentive Award in 2003, Millionaires' Club awards in 2005 and 2006, University Distinguished Researcher award in 2007, SANA award in 2007, an honorable mention for the ICCV 2005 Where Am I? Challenge Problem, and was nominated for the best paper award in ACM Multimedia Conference in 2005. He is an editor of international book series on Video Computing; editor in chief of Machine Vision and Applications journal, and an associate editor of ACM Computing Surveys journal. He was an associate editor of the IEEE Transactions on PAMI, and a guest editor of the special issue of International Journal of Computer Vision on Video Computing. He is the program co-chair of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2008.

Monday, December 21, 2009

Lab Meeting Dez. 23rd 09 (Andi): Shape-based Recognition of 3D Point Clouds in Urban Environments

Authors: Aleksey Golovinskiy, Vladimir G. Kim, Thomas Funkhouser

International Conference on Computer Vision (ICCV), September 2009

This paper investigates the design of a system for recognizing objects in 3D point clouds of urban environments. The system is decomposed into four steps: locating, segmenting, characterizing, and classifying clusters of 3D points. Specifically, we first cluster nearby points to form a set of potential object locations (with hierarchical clustering). Then, we segment points near those locations into foreground and background sets (with a graph-cut algorithm). Next, we build a feature vector for each point cluster (based on both its shape and its context). Finally, we label the feature vectors using a classifier trained on a set of manually labeled objects. The paper presents several alternative methods for each step. We quantitatively evaluate the system and tradeoffs of different alternatives in a truthed part of a scan of Ottawa that contains approximately 100 million points and 1000 objects of interest. Then, we use this truth data as a training set to recognize objects amidst approximately 1 billion points of the remainder of the Ottawa scan.

full Paper

also:
Min-Cut Based Segmentation of Point Clouds
Aleksey Golovinskiy and Thomas Funkhouser
IEEE Workshop on Search in 3D and Video (S3DV) at ICCV, September 2009, Kyoto

Sunday, December 20, 2009

Lab Meeting December 23rd, 2009 (Shao-Chen): Multi-robot SLAM with Unknown Initial Correspondence: The Robot Rendezvous Case

Title: Multi-robot SLAM with Unknown Initial Correspondence: The Robot Rendezvous Case (IROS 2006)

Authors: Xun S. Zhou and Stergios I. Roumeliotis

Abstract:

This paper presents a new approach to the multi-

robot map-alignment problem that enables teams of robots to

build joint maps without initial knowledge of their relative poses.

The key contribution of this work is an optimal algorithm for

merging (not necessarily overlapping) maps that are created

by different robots independently. Relative pose measurements

between pairs of robots are processed to compute the coordinate

transformation between any two maps. Noise in the robot-

to-robot observations, propagated through the map-alignment

process, increases the error in the position estimates of the

transformed landmarks, and reduces the overall accuracy of

the merged map. When there is overlap between the two maps,

landmarks that appear twice provide additional information, in

the form of constraints, which increases the alignment accuracy.

Landmark duplicates are identiﬁed through a fast nearest-

neighbor matching algorithm. In order to reduce the compu-

tational complexity of this search process, a kd-tree is used

to represent the landmarks in the original map. The criterion

employed for matching any two landmarks is the Mahalanobis

distance. As a means of validation, we present experimental

results obtained from two robots mapping an area of 4,800 m

[Link]

Tuesday, December 15, 2009

PhD Thesis Defense: Rhythmic Human-Robot Social Interaction

Marek P. Michalowski
Carnegie Mellon University
December 21, 2009, 10:00 a.m., NSH 3305

Abstract

Social scientists have identified and begun to describe rhythmic and synchronous properties of human social interaction. However, social interactions with robots are often stilted due to temporal mismatch between the behaviors, both verbal and nonverbal, of the interacting partners. This thesis brings the theory of interactional synchrony to bear on the design of social robots with a proposed architecture for rhythmic intelligence. We have developed technology that allows a robot to perceive social rhythms and to behave rhythmically. We have facilitated constrained social interactions, and designed experimental protocols, in which a robot variably synchronizes to human and/or environmental rhythms -- first in a dance-oriented task, and second in a cooperative video game. We have analyzed these interactions to understand the effects of Keepon's rhythmic attention on human performance. This thesis demonstrates that variations in a robot's rhythmic behavior have measurable effect on human rhythmic behavior and on performance in rhythmic tasks. Furthermore, human participants were able to assume and transition between the roles of leader or follower in these tasks.

Thesis Committee

Reid Simmons, Chair
Illah Nourbakhsh
Jodi Forlizzi
Hideki Kozima, Miyagi University, Japan

[link] [thesis draft]

Lab Meeting December 16th, 2009 (Casey): Monocular Vision SLAM for INdoor Aerial Vehicles

Title: Monocular Vision SLAM for INdoor Aerial Vehicles (IROS 2009)

Authors: Koray Celik, Soon-Jo Chung, Matthew Clausman, and Arun K. Somani

Abstract:

This paper presents a novel indoor navigation and ranging strategy by using a monocular camera. The proposed algorithms are integrated with simultaneous localization and mapping(SLAM) with a focus on indoor aerial vehicle applications. We experimentally validate the proposed algorithms by using a fully self-contained micro aerial vehicle (MAV) with on-board image processing and SLAM capabilities. The range measurement strategy is inspired by the key adaptive mechanisms for depth perception and pattern recognition found in humans and intelligent animals. The navigation strategy assumes an unknown, GPS-denied environment, which is representable via corner-like feature points and straight architectural lines. Experimental results show that the system is only limited by the capabilities of the camera and the availability of good corners.

[Link]

Monday, December 14, 2009

Lab Meeting December 16th, 2009 (Jeff): On measuring the accuracy of SLAM algorithms

Title: On measuring the accuracy of SLAM algorithms

Authors: Rainer Kümmerle, Bastian Steder, Christian Dornhege, Michael Ruhnke, Giorgio Grisetti, Cyrill Stachniss and Alexander Kleiner

Abstract:

In this paper, we address the problem of creating an objective benchmark for evaluating SLAM approaches. We propose a framework for analyzing the results of a SLAM approach based on a metric for measuring the error of the corrected trajectory. This metric uses only relative relations between poses and does not rely on a global reference frame. This overcomes serious shortcomings of approaches using a global reference frame to compute the error. Our method furthermore allows us to compare SLAM approaches that use different estimation techniques or different sensor modalities since all computations are made based on the corrected trajectory of the robot.
We provide sets of relative relations needed to compute our metric for an extensive set of datasets frequently used in the robotics community. The relations have been obtained by manually matching laser-range observations to avoid the errors caused by matching algorithms. Our benchmark framework allows the user to easily analyze and objectively compare different SLAM approaches.

Link:
Auton Robot 2009 27:387-407
http://www.springerlink.com/content/5u7458rl080216vr/fulltext.pdf

Wednesday, December 09, 2009

CMU Talk: Corridor View: Making Indoor Life Easier with Large Image Database

CMU VASC Seminar
Monday, Dec 7, 2009
1:30pm-2:30pm
NSH 1507

Corridor View: Making Indoor Life Easier with Large Image Database
Hongwen "Henry" Kang
Ph.D. Student, Robotics

Abstract:

Indoor environment poses substantial challenges for Computer Vision algorithms, due to the combined patterns that are either highly repetitive (e.g. doors), textureless (e.g. white walls), or temporally changing (e.g. posters, pedestrians). The fundamental challenge we want to tackle is the robust image matching. We proposed two approaches to address this problem, one is an iterative algorithm that combines global/local weighting strategies under bag-of-features model, the other data-mines distinctive feature vectors and uses high dimensional features directly for image matching, without quantization. Both of the approaches demonstrate significant improvements compared to straightforward image retrieval approaches, in highly confusing indoor environment. The proposed image matching techniques have broad applications. We selectively demonstrate two of them for this talk, specifically for vision impaired users living in the office environments. One application is data-driven zoomin; the other application is image composition for object pop-out.

Wednesday, November 25, 2009

CMU talk: Unsupervised Detection of Regions of Interest Using Iterative Link Analysis

CMU VASC Seminar
Monday, November 30, 2009

Unsupervised Detection of Regions of Interest Using Iterative Link Analysis
Gunhee Kim
Ph.D. Student, Computer Science Department

Abstract:

This work is a joint project with Antonio Torralba during my visit to MIT and will be presented as a poster at the upcoming NIPS 2009 Conference.

This talk will discuss a fast and scalable alternating optimization technique to detect regions of interest (ROIs) in cluttered Web images without labels. The proposed approach discovers highly probable regions of object instances by iteratively repeating the following two functions: (1) choose the exemplar set (i.e. a small number of highly ranked reference ROIs) across the dataset and (2) refine the ROIs of each image with respect to the exemplar set. These two subproblems are formulated as ranking in two different similarity networks of ROI hypotheses by link analysis. The experiments with the PASCAL 06 dataset show that our unsupervised localization performance is better than one of the state-of-the-art techniques and comparable to supervised methods. Also, we test the scalability of our approach with five objects in a Flickr dataset consisting of more than 200K images.

Bio: Gunhee Kim is a Ph.D. student in CMU's Computer Science Department advised by Takeo Kanade. He received his master's degree under the supervision of Martial Hebert in 2008 from the Robotics Institute at CMU. His research interests are computer vision, machine learning, data mining, and biomedical imaging.

Monday, November 23, 2009

Lab Meeting November 25, 2009 (Alan): Navigating, Recognizing and Describing Urban Spaces With Vision and Lasers (IJRR 2009)

Title: Navigating, Recognizing and Describing Urban Spaces With Vision and Lasers (IJRR 2009)

Authors: Paul Newman, Gabe Sibley, Mike Smith, Mark Cummins, Alastair Harrison, Chris Mei, Ingmar Posner, Robbie Shade, Derik Schroeter, Liz Murphy, Winston Churchill, Dave Cole, Ian Reid

Abstract:

In this paper we describe a body of work aimed at extending the reach of mobile navigation and mapping. We describe how running topological and metric mapping and pose estimation processes concurrently, using vision and laser ranging, has produced a full six degree-of-freedom outdoor navigation system. It is capable of pro-ducing intricate three-dimensional maps over many kilometers and in real time. We consider issues concerning the intrinsic quality of the built maps and describe our progress towards adding semantic labels to maps via scene de-construction and labeling. We show how our choices of representation, inference methods and use of both topological and metric techniques naturally allow us to fuse maps built from multiple sessions with no need for manual frame alignment or data association.

Link: http://ijr.sagepub.com/cgi/content/abstract/28/11-12/1406?etoc

Sunday, November 22, 2009

CMU PhD Thesis Proposal: Learning Methods for Thought Recognition

CMU RI PhD Thesis Proposal

Mark Palatucci
Learning Methods for Thought Recognition
November 18, 2009, 3:00 p.m., NSH 3305

Abstract
This thesis proposal considers the problem of training machine learning classifiers in domains where data are very high dimensional and training examples are extremely limited or impossible to collect for all classes of interest. As a case study, we focus on the application of thought recognition, where the objective is to classify a person’s cognitive state from a recorded image of that person’s neural activity. Machine learning and pattern recognition methods have already made a large impact on this field, but most prior work has focused on classification studies with small numbers of classes and moderate amounts of training data. In this thesis, we focus on thought recognition in a limited data setting, where there are few, if any, training examples for the classes we wish to discriminate, and the number of possible classes can be in the thousands.

Despite these constraints, this thesis seeks to demonstrate that it is possible to classify noisy, high dimensional data with extremely few training examples by using spatial and temporal domain knowledge, intelligent feature selection, semantic side information, and large quantities of unlabeled data from related tasks.

In our preliminary work, we showed that it possible that build a binary classifier that can accurately classify between cognitive states with more than 80,000 features, and only two training examples per class. We also showed how classification can be improved using principled feature selection, and derived a significance test using order statistics that is appropriate for very high-dimensional problems with small numbers of training examples.

We have also explored the most extreme case of limited data, the zero-shot learning setting, where we do not have any training examples for classes we wish to discriminate. We showed that by using a knowledge base of semantic side information to create intermediate features, we can build a classifier that can classify words that people are thinking about, even without training data for those words while the classifier is forced to choose between nearly 1,000 different candidate words.

Finally, we showed how multi-task learning can be used to learn useful semantic features directly from data. We formulated the semantic feature learning problem as a Multi-task Lasso and presented an extremely fast and highly scalable algorithm for solving the resulting optimization.

We propose work to extend our zero-shot learning setting by optimizing semantic feature sets and by using an active learning framework to choose the most informative training examples. We also propose to use latent feature models such as components analysis and sparse coding in a self-taught learning framework to improve decoding by leveraging data from additional neural imaging experiments.

[PDF]

Thesis Committee
Tom Mitchell, Chair
Dean Pomerleau
J. Andrew Bagnell
Andrew Ng, Stanford University

Saturday, November 21, 2009

CMU talk: Imitation Learning and Purposeful Prediction

Machine Learning Lunch (http://www.cs.cmu.edu/~learning/)
Speaker: Prof. Drew Bagnell
Date: Monday, November 23, 2009

Imitation Learning and Purposeful Prediction

Programming robots is hard. While demonstrating a desired behavior may be easy, designing a system that behaves this way is often difficult, time consuming, and ultimately expensive. Machine learning promises to enable "programming by demonstration" for developing high-performance robotic systems. Unfortunately, many approaches that utilize the classical tools of supervised learning fail to meet the needs of imitation learning. Perhaps foremost, classical statistics and supervised machine learning exist in a vacuum: predictions made by these algorithms are explicitly assumed to not affect the world in which they operate. I'll discuss the problems that result from ignoring the effect of actions influencing the world, and I'll highlight simple "reduction-based" approaches that, both in theory and in practice, mitigate these problems.

Additionally, robotic systems are often built atop sophisticated planning algorithms that efficiently reason far into the future; consequently, ignoring these planning algorithms in lieu of a supervised learning approach often leads to poor and myopic performance. While planners have demonstrated dramatic success in applications ranging from legged locomotion to outdoor unstructured navigation, such algorithms rely on fully specified cost functions that map sensor readings and environment models to a scalar cost. Such cost functions are usually manually designed and programmed. Recently, our group has developed a set of techniques that learn these functions from human demonstration by applying an /Inverse Optimal Control/ (IOC) approach to find a cost function for which planned behavior mimics an expert's demonstration. These approaches shed new light on the intimate connections between probabilistic inference and optimal control. I'll consider case studies in activity forecasting of drivers and pedestrians as well as the imitation learning of robotic locomotion and rough-terrain navigation. These case-studies highlight key challenges in applying the algorithms in practical settings.

Friday, November 20, 2009

Lab Meeting November 25, 2009 (KuoHuei): You’ll NeverWalk Alone: Modeling Social Behavior for Multi-target Tracking (ICCV 2009)

Title: You’ll NeverWalk Alone: Modeling Social Behavior for Multi-target Tracking

The Twelfth IEEE International Conference on Computer Vision (ICCV 2009)

Authors: S. Pellegrini, A. Ess, K. Schindler, and L. van Gool

Abstract:

Object tracking typically relies on a dynamic model to predict the object’s location from its past trajectory. In crowded scenarios a strong dynamic model is particularly important, because more accurate predictions allow for smaller search regions, which greatly simplifies data association.

Traditional dynamic models predict the location for each target solely based on its own history, without taking into account the remaining scene objects. Collisions are resolved only when they happen. Such an approach ignores important aspects of human behavior: people are driven by their future destination, take into account their environment, anticipate collisions, and adjust their trajectories at an early stage in order to avoid them. In this work, we introduce a model of dynamic social behavior, inspired by models developed for crowd simulation. The model is trained with videos recorded from birds-eye view at busy locations, and applied as a motion model for multi-people tracking from a vehicle-mounted camera. Experiments on real sequences show that accounting for social interactions and scene knowledge improves tracking performance, especially during occlusions.

[link]

Wednesday, November 11, 2009

Lab Meeting November 11, 2009 (swem): Avoiding Negative Depth in Inverse Depth Bearing-Only SLAM (IROS 2008)

Title: Avoiding Negative Depth in Inverse Depth Bearing-Only SLAM

(2008 IEEE/RSJ International Conference on Intelligent Robots and Systems)

Author: Martin P. Parsley and Simon J. Julier

Abstract:

In this paper we consider ways to alleviate negative estimated depth for the inverse depth parameterisation of bearing-only SLAM. This problem, which can arise even if the beacons are far from the platform, can cause catastrophic failure of the filter.We consider three strategies to overcome this difficulty: applying inequality constraints, the use of truncated second order filters, and a reparameterisation using the negative logarithm of depth. We show that both a simple inequality method and the use of truncated second order filters are succesful. However, the most robust peformance is achieved using the negative log parameterisation.

Link: http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=4651118

Tuesday, November 10, 2009

ICCV'09 Oral Paper: You’ll NeverWalk Alone: Modeling Social Behavior for Multi-target Tracking

You’ll NeverWalk Alone: Modeling Social Behavior for Multi-target Tracking

S. Pellegrini, A. Ess, K. Schindler and L. van Gool
ICCV 2009 (oral)

Abstract:
Object tracking typically relies on a dynamic model to predict the object’s location from its past trajectory. In crowded scenarios a strong dynamic model is particularly important, because more accurate predictions allow for smaller search regions, which greatly simplifies data association. Traditional dynamic models predict the location for each target solely based on its own history, without taking into account the remaining scene objects. Collisions are resolved only when they happen. Such an approach ignores important aspects of human behavior: people are driven by their future destination, take into account their environment, anticipate collisions, and adjust their trajectories at an early stage in order to avoid them. In this work, we introduce a model of dynamic social behavior, inspired by models developed for crowd simulation. The model is trained with videos recorded from birds-eye view at busy locations, and applied as a motion model for multi-people tracking from a vehicle-mounted camera. Experiments on real sequences show that accounting for social interactions and scene knowledge improves tracking performance, especially during occlusions. [PDF]

Saturday, November 07, 2009

Lab Meeting November 11, 2009(Jim Yu): Planning-based Prediction for Pedestrians

Title: Planning-based Prediction for Pedestrians
Author: B. D. Ziebart, N. Ratliff, G. Gallagher, C. Mertz, K. Peterson, J. A. Bagnell, M. Hebert, A K. Dey, S. Srinivasa
International Conference on Intelligent Robots and Systems (IROS 2009)

We present a novel approach for determining robot movements that efficiently accomplish the robot’s tasks while not hindering the movements of people within the environment. Our approach models the goal-directed trajectories of pedestrians using maximum entropy inverse optimal control. The advantage of this modeling approach is the generality of its learned cost function to changes in the environment and to entirely different environments. We employ the predictions of this model of pedestrian trajectories in a novel incremental planner and quantitatively show the improvement in hindrance sensitive robot trajectory planning provided by our approach.

Link

Friday, November 06, 2009

(PAMI2009)Head Pose Estimation in Computer Vision: A Survey

Authors:
Murphy-Chutorian, E.; Trivedi, M.M.;

Abstract:
The capacity to estimate the head pose of another person is a common human ability that presents a unique challenge for computer vision systems. Compared to face detection and recognition, which have been the primary foci of face-related vision research, identity-invariant head pose estimation has fewer rigorously evaluated systems or generic solutions. In this paper, we discuss the inherent difficulties in head pose estimation and present an organized survey describing the evolution of the field. Our discussion focuses on the advantages and disadvantages of each approach and spans 90 of the most innovative and characteristic papers that have been published on this topic. We compare these systems by focusing on their ability to estimate coarse and fine head pose, highlighting approaches that are well suited for unconstrained environments.

Wednesday, November 04, 2009

NTU talk: Video Analysis in Vision-Based Intelligent Systems

Title: Video Analysis in Vision-Based Intelligent Systems
Speaker: Prof. Hsu-Yung Cheng, National Central University
Time: 2:20pm, Nov 6 (Fri), 2009
Place: Room 103, CSIE building

Abstract: Computer vision and video analysis techniques play an important role in modern intelligent systems. Video-based systems can capture a larger variety of desired information and are relatively inexpensive because cameras are easy to install, operate, and maintain. With the huge amount of video cameras installed everywhere nowadays, there is an urgent need for automated video understanding techniques that can replace human operators to monitor the areas under surveillance. In this talk I will breifly introduce several topics and related techniques in intelligent surveillance applications. More discussions will be given on the topic of video object tracking. I will introduce a work on video object tracking which combines the advantages of both the flexibility of particle sampling and mathematical tractability of Kalman filters. Also, for objects that cannot be separated during the tracking proces, possible solutions are discussed.

Short Biography: Hsu-Yung Cheng received the Bachelor’s degree in computer science and information engineering from National Chiao-Tung University in Taiwan in 2000 and the Master’s degree from the same department in 2002. She earned a degree of Doctor of Philosophy from the University of Washington in Electrical Engineering in 2008. Hsu-Yung Cheng joined the Department of Computer Science and Information Engineering in National Central University in 2008 as an assistant professor. Her research interest includes image and video analysis and intelligent systems.

CMU talk: Challenges in the Practical Application of Machine Learning

Intelligence Seminar

November 10, 2009
3:30 pm

Challenges in the Practical Application of Machine Learning
Carla E. Brodley, Tufts University

Abstract:
In this talk I will discuss the factors that impact the successful application of supervised machine learning. Driven by several interdisciplinary collaborations, we are addressing the problem of what to do when your initial accuracy is lower than is acceptable to your domain experts. Low accuracy can be due to three factors: noise in the class labels, insufficient training data, and whether the features describing each training example are able to discriminate the classes. In this talk, I will discuss research efforts at Tufts addressing the second two factors. The first project, introduces a new problem which we have named active class selection (ACS). ACS arises when one can ask the question: given the ability to collect n additional training instances, how should they be distributed with respect to class? The second project examines how one might assess that the class distinctions are not supported by the features and how constraint-based clustering can be used to uncover the true class structure of the data. These two issues and their solutions will be explored in the context of three applications. The first is to create a map of global map of the land cover of the Earth's surface from remotely sensed data (satellite data). The second is to build a classifier based on data collected from an "artificial nose" to discriminate vapors. The "nose" is a collection of sensors that have different reactions to different vapors. The third is to classify HRCT images of the lung.

Bio:
Carla E. Brodley is a professor in the Department of Computer Science at Tufts University. She received her PhD in computer science from the University of Massachusetts, at Amherst in 1994. From 1994-2004, she was on the faculty of the School of Electrical Engineering at Purdue University. Professor Brodley's research interests include machine learning, knowledge discovery in databases, and computer security. She has worked in the areas of anomaly detection, active learning, classifier formation, unsupervised learning, and applications of machine learning to remote sensing, computer security, digital libraries, astrophysics, content-based image retrieval of medical images, computational biology, saliva diagnostics, evidence-based medicine and chemistry. She was a member of the DSSG in 2004-2005. In 2001 she served as program co-chair for the International Conference on Machine Learning (ICML) and in 2004, she served as the general chair for ICML. Currently she is an associate editor of JMLR and Machine Learning, and she is on the editorial board of DKMD. She is a member of the AAAI Council and is co-chair of the Computing Research Association's Committee on the Status of Women in Computing Research (CRA-W).

Tuesday, November 03, 2009

Lab Meeting November 4, 2009(Chung-Han) : An Active Learning Approach for Segmenting Human Activity Datasets

Title: An Active Learning Approach for Segmenting Human Acticitiy Datasets
Author: Liyue Zhao, Gita Sukthankar
In: MM '09: Proceedings of the seventeen ACM international conference on Multimedia

Abtract:
Human activity datasets collected under natural conditions are an important source of data. Since these contain multiple activities in unscripted sequence, temporal segmentation of multimodal datasets is an important precursor to recognition and analysis. Manual segmentation is prohibitively time consuming and unsupervised approaches for segmentation are unreliable since they fail to exploit the semantic context of the data. Gathering labels for supervised learning places a large workload on the human user since it is relatively easy to gather a mass of unlabeled data but expensive to annotate. This paper proposes an active learning approach for segmenting large motion capture datasets with both small training sets and working sets. Support Vector Machines (SVMs) are learned using an active learning paradigm; after the classifiers are initialized with a small set of labeled data, the users are iteratively queried for labels as needed. We propose a novel method for initializing the classifiers, based on unsupervised segmentation and clustering of the dataset. By identifying and training the SVM with points from pure clusters, we can improve upon a random sampling strategy for creating the query set. Our active learning approach improves upon the initial unsupervised segmentation used to initialize the classifier, while requiring substantially less data than a fully supervised method; the resulting segmentation is comparable to the latter while requiring significantly less effort from the user.

[Full Text]

Monday, November 02, 2009

Lab Meeting November 4, 2009(Jimmy): Learning To Detect Unseen Object Classes by Between-Class Attribute Transfer

Title: Learning To Detect Unseen Object Classes by Between-Class Attribute Transfer
Authors: Christoph H. Lampert, Hannes Nickisch, and Stefan Harmeling
In: CVPR2009

Abstract
We study the problem of object classification when training and test classes are disjoint, i.e. no training examples of the target classes are available. This setup has hardly been studied in computer vision research, but it is the rule rather than the exception, because the world contains tens of thousands of different object classes and for only a very few of them image, collections have been formed and annotated with suitable class labels.

In this paper, we tackle the problem by introducing attribute-based classification. It performs object detection based on a human-specified high-level description of the target objects instead of training images. The description consists of arbitrary semantic attributes, like shape, color or even geographic information. Because such properties transcend the specific learning task at hand, they can be pre-learned, e.g. from image datasets unrelated to the current task. Afterwards, new classes can be detected based on their attribute representation, without the need for a new training phase. In order to evaluate our method and to facilitate research in this area, we have assembled a new largescale dataset, “Animals with Attributes”, of over 30,000 animal images that match the 50 classes in Osherson’s classic table of how strongly humans associate 85 semantic attributes with animal classes. Our experiments show that by using an attribute layer it is indeed possible to build a learning object detection system that does not require any training images of the target classes.

[link]

I will also try to introduce the NIPS2009 paper Zero-Shot Learning with Semantic Output Codes by M. Palatucci, D. Pomerleau, G. Hinton, and T.M. Mitchell, which gives some formalization to the problem.

Wednesday, October 28, 2009

(IROS2009)Video: RF Vision: RFID Receive Signal Strength Indicator (RSSI) Images for Sensor Fusion and Mobile Manipulation

Video:

http://video.aol.co.uk/video-detail/rfid-receive-signal-strength-indicator-rssi-images/3237866515

Title: RF Vision: RFID Receive Signal Strength Indicator (RSSI) Images for Sensor Fusion and Mobile Manipulation

Abstract:

In this work we present a set of integrated methods that enable an RFID-enabled mobile manipulator to approach and grasp an object to which a self-adhesive passive (battery-free) UHF RFID tag has been affixed.

Link:
IROS2009

I will find the pdf file later.

Tuesday, October 27, 2009

NTU talk:

Title: Localization and Mapping of Surveillance Cameras in City Map
Speaker: Prof. Leow Wee Kheng, National University of Singapore
Time: 2:20pm, Oct 30 (Fri), 2009
Place: Room 103, CSIE building

Abstract:

Many large cities have installed surveillance cameras to monitor human activities for security purposes. An important surveillance application is to track the motion of an object of interest, e.g., a car or a human, using one or more cameras, and plot the motion path in a city map. To achieve this goal, it is necessary to localize the cameras in the city map and to determine the correspondence mappings between the positions in the city map and the camera views. Since the view of the city map is roughly orthogonal to the camera views, there are very few common features between the two views for a computer vision algorithm to correctly identify corresponding points automatically. We propose a method for camera localization and position mapping that requires minimum user inputs. Given approximate corresponding points between the city map and a camera view identified by a user, the method computes the orientation and position of the camera in the city map, and determines the mapping between the positions in the city map and the camera view. The performance of the method is assessed in both quantitative tests and practical application. Quantitative test results show that the method is accurate and robust in camera localization and position mapping. Application test results are very encouraging, showing the usefulness of the method in real applications.

Short Biography: Dr. Leow Wee Kheng obtained his B.Sc. and M.Sc. in Computer Science from National University of Singapore in 1985 and 1989 respectively. He pursued Ph.D. study at The University of Texas at Austin and obtained his Ph.D. in Computer Science in 1994. His curent research interests include computer vision, medical image analysis, and protein docking. He has published more than 80 technical papers in journals, conferences, and books. He has also been awarded two U.S. patents and has published another patent under PCT. He has served in the Program Committees and Organizing Committees of various conferences. He has collaborated widely with a large number of local and overseas institutions. His current local collaborators include I2R of A*STAR, Singapore General Hospital, National University Hospital, and National Skin Centre, and overseas collaborators include CNRS in France and National Taiwan University and National Taiwan University Hospital.

Saturday, October 24, 2009

CMU talk: Set estimation for statistical inference in brain imaging and active sensing

Machine Learning Lunch (http://www.cs.cmu.edu/~learning/)
Speaker: Prof. Aarti Singh
Venue: GHC 6115
Date: Monday, October 26, 2009

Set estimation for statistical inference in brain imaging and active sensing

Inferring spatially co-located regions of interest is an important problem in several applications, such as identifying activation regions in the brain or contamination regions in environmental monitoring. In this talk, I will present multi-resolution methods for passive and active learning of sets that aggregate data at appropriate resolutions, to achieve optimal bias and variance tradeoffs for set estimation. In the passive setting, we observe some data such as a noisy fMRI image of the brain and then extract the regions with statistically significant brain activity. Active setting, on the other hand, involves feedback where the location of an observation is decided based on the data observed in the past. This can be used for rapid extraction of set estimates, such as a contamination region in environmental monitoring, by designing data-adaptive spatial survey paths for a mobile sensor. I will describe a method that uses information gleaned from coarse surveys to focus sampling around informative regions (boundaries), thus generating successively refined multi-resolution set estimates.

I will also discuss some current research directions which aim at efficient extraction of spatially distributed sets of interest by exploiting non-local dependencies in the data.

Friday, October 23, 2009

Lab Meeting 10/28, 2009(Kuen-Han): Coupled Object Detection and Tracking from Static Cameras and Moving Vehicles(PAMI 2008)

Title: Coupled Object Detection and Tracking from Static Cameras and Moving Vehicles(PAMI 2008)
Authors: B. Leibe, K. Schindler, N. Cornelis, and L. Van Gool.

Abstract

Abstract—We present a novel approach for multi-object tracking
which considers object detection and spacetime trajectory
estimation as a coupled optimization problem. Our approach
is formulated in a Minimum Description Length hypothesis
selection framework, which allows our system to recover from
mismatches and temporarily lost tracks. Building upon a stateof-
the-art object detector, it performs multi-view/multi-category
object recognition to detect cars and pedestrians in the input
images. The 2D object detections are checked for their consistency
with (automatically estimated) scene geometry and are converted
to 3D observations, which are accumulated in a world coordinate
frame. A subsequent trajectory estimation module analyzes the
resulting 3D observations to find physically plausible spacetime
trajectories. Tracking is achieved by performing model selection
after every frame. At each time instant, our approach searches for
the globally optimal set of spacetime trajectories which provides
the best explanation for the current image and for all evidence
collected so far, while satisfying the constraints that no two
objects may occupy the same physical space, nor explain the
same image pixels at any point in time. Successful trajectory
hypotheses are then fed back to guide object detection in future
frames. The optimization procedure is kept efficient through
incremental computation and conservative hypothesis pruning.
We evaluate our approach on several challenging video sequences
and demonstrate its performance on both a surveillance-type
scenario and a scenario where the input videos are taken from
inside a moving vehicle passing through crowded city areas.

link

webpage

Thursday, October 22, 2009

Article: PREDICTING RESEARCH FUTURES

Next-Generation Research and Breakthrough Innovation: Indicators from US Academic Research

by Thomas C. McMail

When searching for breakthrough advantages, the key innovations are those that surpass the present state so significantly that they might well lead to the next generation of technical advances. Over a period of a year and a half, I interviewed more than 100 educators, researchers, and deans in many disciplines with the overall goal of encouraging innovative collaborations between Microsoft Research and academics as part of my responsibilities as a university-relations specialist. The survey reveals connections among a broad range of topics and various recurrent themes that might be of interest to scientists in various fields.

[Full Article]

It is good to see that robotics is one of the Next-Generation Research and Breakthrough Innovations.

-Bob

Wednesday, October 21, 2009

Lab Meeting 10/28 (Any): GroupSAC

Kai Ni, Hailin Jin, and Frank Dellaert. GroupSAC: Efficient Consensus in the Presence of Groupings. In International Conference on Computer Vision (ICCV), September 2009.

Abstract--We present a novel variant of the RANSAC algorithmthat is much more efficient, in particular when dealing with problems with low inlier ratios. Our algorithm assumes that there exists some grouping in the data, based on which we introduce a new binomial mixture model rather than the simple binomial model as used in RANSAC. We prove that in the new model it is more efficient to sample data from a smaller numbers of groups and groups with more tentative correspondences, which leads to a new sampling procedure that uses progressive numbers of groups. We demonstrate our algorithm on two classical geometric vision problems: wide-baseline matching and camera resectioning. The experiments show that the algorithm servesas a general framework that works well with three possible grouping strategies investigated in this paper, including a novel optical flow based clustering approach. The results show that our algorithm is able to achieve a significant performance gain compared to the standard RANSAC and PROSAC.

Tuesday, October 20, 2009

Lab Meeting 10/21 (Nicole):Discovery of sound sources by an autonomous mobile robot (Autonomous Robots Vol 27 No.3 Oct.2009)

Title:Discovery of sound sources by an autonomous mobile robot

Authors:Eric Martinson and Alan Schultz

Abstract:

In this work, we describe an autonomous mobile robotic system for finding, investigating, and modeling ambient noise sources in the environment. The system has been fully implemented in two different environments, using two different robotic platforms and a variety of sound source types. Making use of a two-step approach to autonomous exploration of the auditory scene, the robot first quickly moves through the environment to find and roughly localize unknown sound sources using the auditory evidence grid algorithm. Then, using the knowledge gained from the initial exploration, the robot investigates each source in more depth, improving upon the initial localization accuracy, identifying volume and directivity, and, finally, building a classification vector useful for detecting the sound source in the future.

[Link]

Saturday, October 17, 2009

CMU talk: Cross-Modal Localization Through Mutual Information

FRC Seminar: Cross-Modal Localization Through Mutual Information

Speaker: Dr. Alen Alempijevic
ARC Centre for Autonomous Systems
Mechatronics and Intelligent Systems Group
University of Technology Sydney, Australia

Friday October 16, 2009
NSH 1109
12:00pm

Abstract: Relating information originating from disparate sensors observing a given scene is a challenging task, particularly when an appropriate model of the environment or the behaviour of any particular object within it is not available. One possible strategy to address this task is to examine whether the sensor outputs contain information which can be attributed to a common cause. I will present an approach to localise this embedded common information through an indirect method of estimating mutual information between all signal sources. Ability of L1 regularization to enforce sparseness of the solution is exploited to identify a subset of signals that are related to each other, from among a large number of sensor outputs. As opposed to the conventional L2 regularization, the proposed method leads to faster convergence with reduced spurious associations.

Speaker Bio: Dr. Alen Alempijevic is a Research Fellow within the ARC Centre for Autonomous Systems (Mechatronics and Intelligent Systems Group) at the University of Technology Sydney. He earned his BE in Computer Systems and PhD degrees in Mechatronics Engineering from the University of Technology Sydney in 2003 and 2009 respectively. He has been a guest researcher at UC Berkeley as part of the 2007 entry into the DARPA Grand Challenge and is currently working on 6DOF localization of an underground miner as part of an ARC Linkage Grant. His research interests are in perception for long term autonomy, distributed sensing in self-reconfiguring modular robots and SLAM for urban search and rescue vehicles.

CMU talk: Applied machine learning in human-computer interaction research

Machine Learning Lunch (http://www.cs.cmu.edu/~learning/)
Speaker: Moira Burke
Venue: GHC 6115
Date: Monday, October 19, 2009

Applied machine learning in human-computer interaction research

Human-computer interaction researchers use diverse methods—from eye-tracking to ethnography—to understand human activity, and machine learning is growing in popularity as a method within the community. This talk will survey projects from HCI researchers at CMU combining machine learning with other techniques to problems such as adapting interfaces to individuals with motor impairments, predicting routines in dual-income families, classifying controversial Wikipedia articles, and identifying rhetorical strategies newcomers use in online support groups that elicit responses. Researchers without strong computational backgrounds can face practical challenges as consumers of machine learning tools; this talk will highlight opportunities for tool design and collaboration across communities.

Thursday, October 15, 2009

Lab Meeting 10/21 (Gary): 3D Alignment of Face in a Single Image (CVPR 06)

Authors: Lie Gu, Takeo Kanade

Abstract:

We present an approach for aligning a 3D deformable model to a single face image. The model consists of a set of sparse 3D points and the view-based patches associated with every point. Assuming a weak perspective projection model, our algorithm iteratively deforms the model and adjusts the 3D pose to fit the image. As opposed to previous approaches, our algorithm starts the fitting without resorting to manual labeling of key facial points. And it makes no assumptions about global illumination or surface properties, so it can be applied to a wide range of imaging conditions. Experiments demonstrate that our approach can effectively handle unseen faces with a variety of pose and illumination variations.

link

Tuesday, October 13, 2009

NTU talk: Bridging the Gap between Signal Processing and Machine Learning

Title: Bridging the Gap between Signal Processing and Machine Learning
Speaker: Dr. Y.-C. Frank Wang, Academia Sinica
Time: 2:20pm, Oct 16 (Fri), 2009
Place: Room 103, CSIE building

Abstract:
The advancements of signal processing and machine learning techniques have afforded many applications which benefit people in different areas. The first part of this talk starts with some interesting examples, and explains how “signal processing” and “machine learning” people might interpret the same thing in different points of view. I will discuss why it is vital to bridge the gap between these two areas, and what can be achieved by close collaboration between people from these two communities.

Many of real-world machine learning applications involve recognition of multiple classes. However, standard classification methods may not perform well and efficiently on large-scale problems. Designing a classifier with good generalization and scalability is an ongoing research topic for machine learning researchers. Another important issue, which is typically not addressed in prior work, is the rejection of unseen false classes (not of interest). Since we cannot design a classifier by training the data from “unseen” classes, the rejection problem becomes very challenging. In the second part of this talk, I will present my proposed work, soft-decision hierarchical SVRDM (support vector representation and discrimination machine) classifier, which is to address aforementioned multiclass classification and rejection problems.

Short Biography:
Yu-Chiang Frank Wang received his B.S. in Electrical Engineering from National Taiwan University in 2001. Before pursuing his graduate study, he worked as a research assistant in the Division of Medical Engineering Research at National Health Research Institutes in Taiwan from 2001 to 2002. He received his M.S. and Ph.D. degrees in Electrical and Computer Engineering from Carnegie Mellon University in 2004 and 2009, respectively. His research projects at CMU included the development of a military automated target recognition system, the design of a multi-modal biometric fusion system, and algorithms for data clustering and multi-class classification problems. His research and graduate study were funded by the US Army Research Office through Carnegie Mellon.

Dr. Wang is currently an assistant research fellow in the Research Center for Information Technology Innovation (CITI) at Academia Sinica, where he also holds the position as an adjunct assistant research fellow in the Institute of Information Science. His research interests span the areas of pattern recognition, machine learning, computer vision, multimedia signal processing and content analysis.

NTU talk: Data-Aware Search: Web Scale Integration, Web Scale Inspiration

Title: Data-Aware Search: Web Scale Integration, Web Scale Inspiration
Speaker: Prof. Kevin C. Chang, UIUC
Time: 2:10pm, Oct 15 (Thu), 2009
Place: Room 210, CSIE building

Abstract:
What have you been searching lately? With so much data on the web, we often look for various "stuff" across many sites-- but current search engines can only find web pages and then only one page at a time.

Towards "data-aware" search over the web as a massive database, we face the challenges of integrating data from everywhere. The barrier boils down to the classic "impedance mismatch" between structured queries over unstructured data-- but now at the Internet scale! I will discuss our lessons learned in the MetaQuerier and WISDM projects at Illinois, in which we develop large-scale data integration by two approaches with duality-- bringing data to match queries, and vice versa. I will demo prototypes and their productization at Cazoodle.

Short Biography:
Kevin C. Chang is an Associate Professor in Computer Science, University of Illinois at Urbana-Champaign. He received a BS from National Taiwan University and PhD from Stanford University, in Electrical Engineering. He likes large scale information access and, with his students, co-founded Cazoodle, a startup from the University of Illinois, for developing "data-aware" search over the web. URL: http://www-faculty.cs.uiuc.edu/~kcchang

Monday, October 12, 2009

Lab Meeting 14/10 (Andi): MMM-classification of 3D Range Data (ICRA 09)

Authors: Anuraag Agrawal, Atsushi Nakazawa, and Haruo Takemura (Osaka University)

Abstract: This paper presents a method for accurately segmenting and classifying 3D range data into particular object classes. Object classification of input images is necessary for applications including robot navigation and automation, in particular with respect to path planning. To achieve robust object classification, we propose the idea of an object feature which represents a distribution of neighboring points around a target point. In addition, rather than processing raw points, we reconstruct polygons from the point data, introducing connectivity to the points. With these ideas, we can refine the Markov Random Field (MRF) calculation with more relevant information with regards to determining “related points”. The algorithm was tested against five outdoor scenes and provided accurate classification even in the presence of many classes of interest.

local copy

Sunday, October 11, 2009

Lab Meeting 10/14, 2009 (Shao-Chen): Distributed Multirobot Localization(IEEE Transactions on Robotics and Automation Oct. 2002)

Title: Distributed Multirobot Localization(IEEE Transactions on Robotics and Automation Oct. 2002)

Authors: Stergios I. Roumeliotis, George A. Bekey

Abstract:

In this paper, we present a new approach to the
problem of simultaneously localizing a group of mobile robots
capable of sensing one another. Each of the robots collects sensor
data regarding its own motion and shares this information with
the rest of the team during the update cycles. A single estimator,
in the form of a Kalman filter, processes the available positioning
information from all the members of the team and produces
a pose estimate for every one of them. The equations for this
centralized estimator can be written in a decentralized form,
therefore allowing this single Kalman filter to be decomposed
into a number of smaller communicating filters. Each of these
filters processes sensor data collected by its host robot. Exchange
of information between the individual filters is necessary only
when two robots detect each other and measure their relative
pose. The resulting decentralized estimation schema, which we
call collective localization, constitutes a unique means for fusing
measurements collected from a variety of sensors with minimal
communication and processing requirements. The distributed
localization algorithm is applied to a group of three robots and
the improvement in localization accuracy is presented. Finally, a
comparison to the equivalent decentralized information filter is
provided.

Link

CMU talk: The Inner Workings of face Processing: From human to computer perception and back

CMU VASC Seminar
Monday, October 12
2pm-3pm

The Inner Workings of face Processing: From human to computer perception and back
Aleix Martinez
Ohio State University

Abstract:
Faces and emotions shape our daily life in many different ways. We are so intrigued about such effects that writers, poets and painters have been depicting and portraying them for centuries. Why does the male character in Wood's "American Gothic" seem sad? Why do kids elongate their faces when they are upset? Why do some people always seem angry? Why do we recognize identity from faces so easily? Why is it so hard to learn non-manuals (i.e., facial expressions of grammar) in sign languages, when native signers do this effortlessly? In short, what are the dimensions of our computational (cognitive) space responsible for these face processing tasks? If we are to understand why things appear as they do and how cognitive disorders in Autism, schizophrenia and Huntington’s disease develop, we need to define how the brain codes and interpreters faces, emotions and grammar. This is also important for the design of technology – as devices need to interact with us. In this talk, I will outline the research framework I use to study face perception and the related topics of emotion and grammar. This consists of a multidisciplinary approach in cognitive science, including psychophysical studies and the design of computer algorithms for the analysis of face images. We will review the big questions about this computational space. In doing so, we will see that the ability of human observers to process face images is truly remarkable, suggesting that some abstract, yet simple representation that is unaffected by a large number of image transformations is at work. We will summarize our current understanding of this representation.

Bio:
Aleix M. Martinez is an Associate Professor in the Department of Electrical and Computer Engineering at The Ohio State University (OSU), where he is the founder and director of the Computational Biology and Cognitive Science Lab. He is also affiliated with the Department of Biomedical Engineering and to the Center for Cognitive Science. Prior to joining OSU, he was affiliated with the Electrical and Computer Engineering Department at Purdue University and with the Sony Computer Science Lab. He serves as an associate editor of IEEE Transactions on Pattern Analysis and Machine Intelligence and of Image and Vision Computing and has been an area chair of CVPR. Aleix has spent his time wondering why he is such a bad face recognizer and why people attribute social labels to faces of unknown individuals. His other areas of interest are learning, vision and linguistics.

Tuesday, October 06, 2009

Lab Meeting 10/6, 2009 (Casey): Towards Visual Localization, Mapping and Moving Objects Tracking by a Mobile Robot: a Geometric and Probabilistic ...

Title: Towards Visual Localization, Mapping and Moving Objects Tracking by a Mobile Robot: a Geometric and Probabilistic Approach

Author: Joan Solà

(PhD Thesis, 2007/02)

(LAAS-CNRS laboratory in Toulouse, Occitania, France )

Outline:

1. Undelayed initialization in monocular SLAM

2. Binocular SLAM

3. Moving object detection and tracking

[PhD Thesis pdf file]

[Joan Solà's website]

Sunday, September 27, 2009

NTU talk: Current Challenges in Vision-Based Driver Assistance Systems

Title: Current Challenges in Vision-Based Driver Assistance Systems
Speaker: Prof. Reinhard Klette, The University of Auckland, Tamaki campus
Time: 10:00am, Sep 28 (Mon), 2009
Place: Room 105, CSIE building

Abstract: The talk starts with informing briefly about the .enpeda..project at The University of Auckland, and goals in vision-based driver assistance systems (DAS) in general, illustrated by accident statistics. Lane and corridor detection is a traditional DAS subject, and curved and unmarked roads define still a challenge. A solution for corridor (i.e., the expected space to drive in) detection is discussed based on applying the Euclidean distance transform. The main part of the talk is then about current stereo and optic flow algorithms on real-world (stereo) sequences. Prediction error analysis and evaluations on synthetic DAS sequences are discussed as possible options, and conclusions are drawn, such as the suggestion that correspondence algorithms should use residual images as input rather than the original sequences. Finally, a performance evaluation approach is illustrated which is currently under implementation, using a 3D model of a real scene for generating real-world sequences with ground truth for stereo and optical flow.

Short Biography: See Dr. Klette's resaerch at
http://www.mi.auckland.ac.nz/index.php?option=com_content&view=article&id=57&Itemid=49

Saturday, September 19, 2009

Lab Meeting September 23rd, 2009 (Jeff): Topological Modeling and Classification in Home environment using Sonar gridmap

Title: Topological Modeling and Classification in Home environment using Sonar gridmap

Authors: Jinwoo Choi, Minyong Choi, Kyoungmin Lee and Wan Kyun Chung

Abstract:

This paper presents a method of topological representation and classification in home environment using only low-cost sonar sensors. Approximate cell decomposition and normalized graph cut are applied to sonar gridmap to extract graphical model of the environment. The extracted model represents spatial relation of the environment appropriately by segmenting several subregions. Moreover, node classification is achieved by applying template matching method to a local gridmap. Rotational invariant matching is used to obtain candidate location for each node and the true node can be classified by considering detail distance information. The proposed method extracts well-structured topological model of the environment and classification also results in reliable matching even under the uncertain and sparse sonar data. Experimental results verify the performance of proposed environmental modeling and classification in real home environment.

Link:
ICRA2009
http://pal.csie.ntu.edu.tw/pub/Conferences/2009_ICRA/Conference/data/papers/1720.pdf

Thursday, September 17, 2009

Computerized Face-Recognition Technology Is Still Easily Foiled by Cosmetic Surgery

Computerized Face-Recognition Technology Is Still Easily Foiled by Cosmetic Surgery

In the first test of face-recognition technology vs. cosmetic surgery, face recognition loses.

BY Willie D. Jones // September 2009

For years, developers of face-recognition algorithms have been battling the effects of awkward poses, facial expressions, and disguises like hats, wigs, and fake moustaches. They’ve had some success, but they may be meeting their match in plastic surgery.

Systematic studies have tested face-recognition algorithms in a variety of challenging situations—bad lighting, for example—”but none of those conditions had nearly the effect of plastic surgery,” says Afzel Noore, a computer science and electrical engineering professor at West Virginia University, in Morgantown. In June, Noore reported the results of the first experimental study to quantify the effect of plastic surgery on face-recognition systems, at the IEEE Computer Society’s Computer Vision and Pattern Recognition conference, in Miami. His team of collaborators is based in West Virginia and at the Indraprastha Institute of Information Technology, Delhi, in India.

Using a database containing before-and-after images from 506 plastic surgery patients, Noore and his colleagues tested six of the most widely used face-recognition algorithms. Even in pictures where the subject was facing forward and the lighting was ideal, the best of the algorithms matched a person’s pre- and postsurgery images no more than about 40 percent of the time. The researchers found that for local alterations—say, a nose job, getting rid of a double chin, or removing the wrinkles around the eyes—today’s systems could make a match roughly one-third of the time. For more global changes like a face-lift, the results were dismal: a match rate of just 2 percent.

”We have to devise systems for security applications knowing that people will aim to circumvent them,” says Noore. In particular, researchers must examine a further complication of the plastic surgery problem—the compounding effects of a series of surgeries over time.

Meanwhile, Noore and his coauthors are testing a game-changing hypothesis: that even after plastic surgery, there are features beneath the skin but still observable that remain unchanged.

Wednesday, September 16, 2009

CMU PhD Thesis: Spectral Matching

Spectral Matching, Learning, and Inference for Computer Vision

Marius Leordeanu

doctoral dissertation, tech. report CMU-RI-TR-09-27, Robotics Institute, Carnegie Mellon University, July, 2009

Abstract: Several important applications in computer vision, such as 2D and 3D object matching, object category and action recognition, object category discovery, and texture discovery and analysis, require the ability to match features efficiently in the presence of background clutter and occlusion. In order to improve matching robustness and accuracy it is important to take in consideration not only the local appearance of features but also the higher-order geometry and appearance of groups of features. In this thesis we propose several efficient algorithms for solving this task, based on a general quadratic programming formulation that generalizes the classical graph matching problem. First, we introduce spectral graph matching, which is an efficient method for matching features using both local, ﬁrst-order information, as well as pairwise interactions between the features. We study the theoretical properties of spectral matching in detail and show efficient ways of using it for current computer vision applications. We also propose an efficient procedure with important theoretical properties for the ﬁnal step of obtaining a discrete solution from the continuous one. We show that this discretization step, which has not been studied previously in the literature, is of crucial importance for good performance. We demonstrate its efficiency by showing that it dramatically improves the performance of state-of-the art algorithms. We also propose, for the ﬁrst time, methods for learning graph matching in both supervised and unsupervised fashions. Furthermore, we study the connections between graph matching and the MAP inference problem in graphical models, for which we propose novel inference and learning algorithms. In the last part of the thesis we present an application of our matching algorithm to the problem of object category recognition, and a novel algorithm for grouping image pixels/features that can be effectively used for object category segmentation.

Link: WWW, PDF

Monday, September 14, 2009

Talk: VASC Seminar: Jason Saragih Face Alignment through Subspace Constrained Mean-Shifts

Title:Face Alignment through Subspace Constrained Mean-Shifts

Author:Jason Saragih Post-Doc, Robotics, CMU

September 14, 2009, 2:30pm-3:00pm, NSH 3305

Abstract

Deformable model fitting has been actively pursued in the computer vision community for over a decade. As a result, numerous approaches have been proposed with varying degrees of success. A class of approaches that has shown substantial promise is one that makes independent predictions regarding locations of the model’s landmarks, which are combined by enforcing a prior over their joint motion. A common theme in innovations to this approach is the replacement of the distribution of probable landmark locations, obtained from each local detector, with simpler parametric forms. This simplification substitutes the true objective with a smoothed version of itself, reducing sensitivity to local minima and outlying detections. In this work, a principled optimization strategy is proposed where a nonparametric representation of the landmark distributions is maximized within a hierarchy of smoothed estimates. The resulting update equations are reminiscent of mean-shift but with a subspace constraint placed on the shape’s variability. This approach is shown to outperform other existing methods on the task of generic face fitting.

Speaker Biography

Jason Saragih joined the Robotics Institute as a Post-doctoral Fellow in 2008. He received both his BEng and PhD from the Australian National University in 2004 and 2008 respectively. His research interests concern the modeling and registration of deformable models.

Sunday, September 13, 2009

CMU PhD Thesis Proposal: Robust Monocular Vision-based Navigation for a Miniature Fixed-Wing Aircraft

PhD Thesis Proposal
Myung Hwangbo
Carnegie Mellon University

September 15, 2009, 1:00 p.m., Newell Simon Hall 1109

Title:
Robust Monocular Vision-based Navigation for a Miniature Fixed-Wing Aircraft

Abstract:
Recently the operation of unmanned aerial vehicles (UAVs) has expanded from military to civilian applications. Contrary to remote-controlled tasks in a high altitude, low-altitude flight in an urban environment requires a higher level of autonomy to respond to complex and unpredictable situations. Vision-based methods for autonomous navigation have been a promising approach because of multi-layered information delivered by images but their robustness in various situations has been hard to achieve. We propose a series of monocular computer vision algorithms combined with vehicle dynamics and other navigational sensors in GPS-denied environments like an urban canyon. We use a fixed-wing model airplane of 1m wing span as our UAV platform. Because of its small payload and limited communication bandwidth to off-body processors, particular attention is paid to both realtime and robustness at every level of vision processing of low-grade images.

In point-to-point navigation, state estimation is based on the structure-from-motion method (SFM) using natural landmarks under conditions where the captured images have sufficient texture. To cope with the fundamental limits of monocular visual odometry (scale ambiguity and rotation-translation ambiguity), vehicle dynamics and airspeed measurements are incorporated in a Kalman filter framework. More robust estimation is provided from multiple rails of the SFM which are traced in an interweaving fashion. Sturdy input to the SFM is enabled by optical flow computation which is tightly coupled with the IMU. Predictive warping parameters and a high-order motion model enhance the accuracy and life span of KLT feature tracking. We also employ vision-based horizon detection as an absolute attitude sensor which is useful for low-level control of a UAV.

The performance of the proposed method is evaluated in what we call an air-slalom task, where the UAV is expected to pass through multiple gates in the air in a row. It will demonstrate how a fixed-wing UAV confronts its limited agility, which is inferior to other hovercraft types in typical urban operations. To efficiently find a feasible obstacle-free path to a goal, we propose a 3D Dubins heuristic for optimal cost to a goal and use a set of lateral and longitudinal motion primitives interconnecting at trim states in order to reduce the dimension of configuration space. We first demonstrate our visual navigation in our UAV simulator, which can be switched between live and synthetic modes, each including wireless data transmission to a ground station.
[full PDF]

Thesis committee:
Takeo Kanade, Co-chair
James Kuffner, Co-chair
Sanjiv Singh
Omead Amidi
Randy Beard, Brigham Young University

Friday, September 11, 2009

Lab Meeting 09/23, 2009 (Kuo-Huei): Detecting Unusual Activity in Video (CVPR 2004)

Title: Detecting Unusual Activity in Video

Authors: Hua Zhong, Jianbo Shi and Mirko Visontai

Abstract:
We present an unsupervised technique for detecting unusual activity in a large video set using many simple features. No complex activity models and no supervised feature selections are used. We divide the video into equal length segments and classify the extracted features into prototypes, from which a prototype–segment co-occurrence matrix is computed. Motivated by a similar problem in document keyword analysis, we seek a correspondence relationship between prototypes and video segments which satisfies the transitive closure constraint. We show that an important sub-family of correspondence functions can be reduced to co-embedding prototypes and segments to N-D Euclidean space.We prove that an efficient, globally optimal algorithm exists for the co-embedding problem. Experiments on various real-life videos have validated our approach.

Link

CMU talk: Building Vision Systems for Moving Platforms: Background Subtraction from Freely Moving Cameras

Building Vision Systems for Moving Platforms: Background Subtraction from Freely Moving Cameras

Yaser Sheikh
Assistant Research Professor, Robotics, CMU

September 14, 2009, 2:00pm-2:30pm, NSH 3305

Abstract
Most video analysis systems assume staring cameras that continuously view the same scene from the same point of view. Increasingly, as cameras and computers are becoming smaller and cheaper, freely moving cameras are emerging as a primary platform for computer vision research. Background subtraction algorithms, a mainstay in most computer vision systems, define the background as parts of a scene that are at rest. Traditionally, these algorithms assume a stationary camera, and identify moving objects by detecting areas in a video that change over time. In this talk, I will present ideas to extend the concept of ‘subtracting’ areas at rest to apply to video captured from a freely moving camera. We do not assume that the background is well-approximated by a plane or that the camera center remains stationary during motion. The method operates entirely using 2D image measurements without requiring an explicit 3D reconstruction of the scene.

Speaker Biography
Yaser Sheikh is an Assistant Research Professor at the Robotics Institute and Adjunct Professor at the Department of Mechnical Engineering at Carnegie Mellon University. His research is in understanding dynamic scenes through computer vision, including human activity analysis, dynamic scene reconstruction, mobile camera networks, and nonrigid motion estimation. He obtained his doctoral degree from the University of Central Florida in 2006 and is a recipient of the Hillman award for excellence in computer science research.

Tuesday, September 08, 2009

Lab Meeting 9 / 16, 2009 (Alan): Towards Total Scene Understanding:Classification, Annotation and Segmentation in an Automatic Framework (CVPR 2009)

Title: Towards Total Scene Understanding:Classification, Annotation and Segmentation in an Automatic Framework (CVPR 2009)

Authors: Li-Jia Li, Richard Socher, Li Fei-Fei

Abstract:

Given an image, we propose a hierarchical generative model that classifies the overall scene, recognizes and segments each object component, as well as annotates the image with a list of tags. To our knowledge, this is the first model that performs all three tasks in one coherent framework. For instance, a scene of a ‘polo game’ consists of several visual objects such as ‘human’, ‘horse’, ‘grass’, etc. In addition, it can be further annotated with a list of more abstract (e.g. ‘dusk’) or visually less salient (e.g. ‘saddle’) tags. Our generative model jointly explains images through a visual model and a textual model. Visually relevant objects are represented by regions and patches, while visually irrelevant textual annotations are influenced directly by the overall scene class. We propose a fully automatic learning framework that is able to learn robust scene models from noisy web data such as images and user tags from Flickr.com. We demonstrate the effectiveness of our framework by automatically classifying, annotating and segmenting images from eight classes depicting sport scenes. In all three tasks, our model significantly outperforms stateof-the-art algorithms.

Link: local copy

Paper: Regression-Based Online Situation Recognition for Vehicular Traffic Scenarios

By D. Meyer-Delius, J. Sturm, W. Burgard.

In Proc. of the International Conference on Intelligent Robot Systems (IROS'09), St. Louis, USA, 2009.

Abstract—In this paper, we present an approach for learning generalized models for traffic situations. We formulate the problem using a dynamic Bayesian network (DBN) from which we learn the characteristic dynamics of a situation from labeled trajectories using kernel regression. For a new and unlabeled trajectory, we can then infer the corresponding situation by evaluating the data likelihood for the individual situation models. In experiments carried out on laser range data gathered on a car in real traffic and in simulation, we show that we can robustly recognize different traffic situations even from trajectories corresponding to partial situation instances.

[Full PDF]

Saturday, September 05, 2009

CMU PhD Oral: Learning in Modular Systems

CMU PhD Thesis Defense:
David M. Bradley

Title: Learning in Modular Systems

Abstract: Complex robotics systems are often built as a system of modules, where each module solves a separate data processing task to produce the complex overall behavior that is required of the robot. For instance, the perception system for autonomous off-road navigation discussed in this thesis uses a terrain classification module, a ground-plane estimation module, and a path-planning module among others. Splitting a complex task into a series of sub-problems allows human designers to engineer solutions for each sub-problem independently, and devise efficient specialized algorithms to solve them. However, modular design can also create problems for applying learning algorithms. Ideally, learning should find parameters for each module that optimize the performance of the overall system. This requires obtaining ``local'' information for each module about how changing the parameters of that module will impact the output of the system.

Previous work in modular learning showed that if the modules of system were differentiable, gradient descent could be used to provide this local information in “shallow” systems containing with two or three modules between input and output. However, except for convolutional neural networks, this procedure was rarely successful in “deep” systems of more than three modules. Many robotics applications added an additional complication by employing a planning algorithm to produce their output. This makes it hard to define a “loss” function to judge how well the system is performing, or compute a gradient with respect to previous modules in the system.

Recent advances in learning deep neural networks suggest that learning in deep systems can be successful if data-dependent regularization is first used to provide relevant local information to the modules of the system, and the modules are then jointly optimized by gradient descent. Concurrently, research in imitation learning has offered effective new ways of defining loss functions for the output of planning modules.

This thesis combines these lines of research to develop new tools for learning in modular systems. As data-dependent regularization has been shown to be critical to success in deep modular systems, several significant contributions are provided in this area. A novel, differentiable formulation of sparse coding is presented and shown to be a powerful semi-supervised learning algorithm. Sparse coding has traditionally used non-convex optimization methods, and an alternative, convex formulation is developed with a deterministic optimization procedure. Theoretical contributions developed for this convex formulation also enable an efficient, online multi-task learning algorithm. Results in domain adaptation provide further regularization options. To allow joint optimization of systems that employ planning modules, this thesis leverages loss functions developed in recent imitation learning research, and develops techniques for improving all modules of the system with subgradient descent. Finally, this thesis has also made significant contributions to mobile robot perception for navigation, providing terrain classification techniques that been incorporated into fielded industrial and government systems. [Full PDF]

Thesis Committee Members:
James A. Bagnell, Chair
Martial Hebert
Fernando De la Torre
Yoshua Bengio, University of Montreal

Thursday, September 03, 2009

CMU Computers and Thought Award Lecture:

How Optimized Environmental Sensing Helps Address Information Overload on the Web

Carlos Guestrin
Finmeccanica Associate Professor
Machine Learning and Computer Science Departments
Carnegie Mellon University

In this talk, we tackle a fundamental problem that arises when using sensors to monitor the ecological condition of rivers and lakes, the network of pipes that bring water to our taps, or the activities of an elderly individual when sitting on a chair: Where should we place the sensors in order to make effective and robust predictions? Such sensing problems are typically NP-hard, and in the past, heuristics without theoretical guarantees about the solution quality have often been used. In this talk, we present algorithms which efficiently find provably near-optimal solutions to large, complex sensing problems. Our algorithms are based on the key insight that many important sensing problems exhibit submodularity, an intuitive diminishing returns property: Adding a sensor helps more the fewer sensors we have placed so far. In addition to identifying most informative locations for placing sensors, our algorithms can handle settings, where sensor nodes need to be able to reliably communicate over lossy links, where mobile robots are used for collecting data or where solutions need to be robust against adversaries and sensor failures. We present results applying our algorithms to several real-world sensing tasks, including environmental monitoring using robotic sensors, activity recognition using a built sensing chair, and a sensor placement competition. We conclude with drawing an interesting connection between sensor placement for water monitoring and addressing the challenges of information overload on the web. As examples of this connection, we address the problem of selecting blogs to read in order to learn about the biggest stories discussed on the web, and personalizing content to turn down the noise in the blogosphere.

Bio: Carlos Guestrin is the Finmeccanica Associate Professor in the Machine Learning and in the Computer Science Departments at Carnegie Mellon University. Previously, he was a senior researcher at the Intel Research Lab in Berkeley. Carlos received his PhD in Computer Science from Stanford University and a Mechatronics Engineer degree from the University of Sao Paulo, Brazil. Carlos' work received awards at a number of conferences and journals. He is also a recipient of the ONR Young Investigator Award, the NSF Career Award, the Alfred P. Sloan Fellowship, and the IBM Faculty Fellowship. He was named one of the 2008 `Brilliant 10' by Popular Science Magazine, received the IJCAI Computers and Thought Award, and the Presidential Early Career Award for Scientists and Engineers (PECASE). Carlos is currently a member of the Information Sciences and Technology (ISAT) advisory group for DARPA.