Robot Perception and Learning: September 2009

Sunday, September 27, 2009

NTU talk: Current Challenges in Vision-Based Driver Assistance Systems

Title: Current Challenges in Vision-Based Driver Assistance Systems
Speaker: Prof. Reinhard Klette, The University of Auckland, Tamaki campus
Time: 10:00am, Sep 28 (Mon), 2009
Place: Room 105, CSIE building

Abstract: The talk starts with informing briefly about the .enpeda..project at The University of Auckland, and goals in vision-based driver assistance systems (DAS) in general, illustrated by accident statistics. Lane and corridor detection is a traditional DAS subject, and curved and unmarked roads define still a challenge. A solution for corridor (i.e., the expected space to drive in) detection is discussed based on applying the Euclidean distance transform. The main part of the talk is then about current stereo and optic flow algorithms on real-world (stereo) sequences. Prediction error analysis and evaluations on synthetic DAS sequences are discussed as possible options, and conclusions are drawn, such as the suggestion that correspondence algorithms should use residual images as input rather than the original sequences. Finally, a performance evaluation approach is illustrated which is currently under implementation, using a 3D model of a real scene for generating real-world sequences with ground truth for stereo and optical flow.

Short Biography: See Dr. Klette's resaerch at
http://www.mi.auckland.ac.nz/index.php?option=com_content&view=article&id=57&Itemid=49

Saturday, September 19, 2009

Lab Meeting September 23rd, 2009 (Jeff): Topological Modeling and Classification in Home environment using Sonar gridmap

Title: Topological Modeling and Classification in Home environment using Sonar gridmap

Authors: Jinwoo Choi, Minyong Choi, Kyoungmin Lee and Wan Kyun Chung

Abstract:

This paper presents a method of topological representation and classification in home environment using only low-cost sonar sensors. Approximate cell decomposition and normalized graph cut are applied to sonar gridmap to extract graphical model of the environment. The extracted model represents spatial relation of the environment appropriately by segmenting several subregions. Moreover, node classification is achieved by applying template matching method to a local gridmap. Rotational invariant matching is used to obtain candidate location for each node and the true node can be classified by considering detail distance information. The proposed method extracts well-structured topological model of the environment and classification also results in reliable matching even under the uncertain and sparse sonar data. Experimental results verify the performance of proposed environmental modeling and classification in real home environment.

Link:
ICRA2009
http://pal.csie.ntu.edu.tw/pub/Conferences/2009_ICRA/Conference/data/papers/1720.pdf

Thursday, September 17, 2009

Computerized Face-Recognition Technology Is Still Easily Foiled by Cosmetic Surgery

Computerized Face-Recognition Technology Is Still Easily Foiled by Cosmetic Surgery

In the first test of face-recognition technology vs. cosmetic surgery, face recognition loses.

BY Willie D. Jones // September 2009

For years, developers of face-recognition algorithms have been battling the effects of awkward poses, facial expressions, and disguises like hats, wigs, and fake moustaches. They’ve had some success, but they may be meeting their match in plastic surgery.

Systematic studies have tested face-recognition algorithms in a variety of challenging situations—bad lighting, for example—”but none of those conditions had nearly the effect of plastic surgery,” says Afzel Noore, a computer science and electrical engineering professor at West Virginia University, in Morgantown. In June, Noore reported the results of the first experimental study to quantify the effect of plastic surgery on face-recognition systems, at the IEEE Computer Society’s Computer Vision and Pattern Recognition conference, in Miami. His team of collaborators is based in West Virginia and at the Indraprastha Institute of Information Technology, Delhi, in India.

Using a database containing before-and-after images from 506 plastic surgery patients, Noore and his colleagues tested six of the most widely used face-recognition algorithms. Even in pictures where the subject was facing forward and the lighting was ideal, the best of the algorithms matched a person’s pre- and postsurgery images no more than about 40 percent of the time. The researchers found that for local alterations—say, a nose job, getting rid of a double chin, or removing the wrinkles around the eyes—today’s systems could make a match roughly one-third of the time. For more global changes like a face-lift, the results were dismal: a match rate of just 2 percent.

”We have to devise systems for security applications knowing that people will aim to circumvent them,” says Noore. In particular, researchers must examine a further complication of the plastic surgery problem—the compounding effects of a series of surgeries over time.

Meanwhile, Noore and his coauthors are testing a game-changing hypothesis: that even after plastic surgery, there are features beneath the skin but still observable that remain unchanged.

Wednesday, September 16, 2009

CMU PhD Thesis: Spectral Matching

Spectral Matching, Learning, and Inference for Computer Vision

Marius Leordeanu

doctoral dissertation, tech. report CMU-RI-TR-09-27, Robotics Institute, Carnegie Mellon University, July, 2009

Abstract: Several important applications in computer vision, such as 2D and 3D object matching, object category and action recognition, object category discovery, and texture discovery and analysis, require the ability to match features efficiently in the presence of background clutter and occlusion. In order to improve matching robustness and accuracy it is important to take in consideration not only the local appearance of features but also the higher-order geometry and appearance of groups of features. In this thesis we propose several efficient algorithms for solving this task, based on a general quadratic programming formulation that generalizes the classical graph matching problem. First, we introduce spectral graph matching, which is an efficient method for matching features using both local, ﬁrst-order information, as well as pairwise interactions between the features. We study the theoretical properties of spectral matching in detail and show efficient ways of using it for current computer vision applications. We also propose an efficient procedure with important theoretical properties for the ﬁnal step of obtaining a discrete solution from the continuous one. We show that this discretization step, which has not been studied previously in the literature, is of crucial importance for good performance. We demonstrate its efficiency by showing that it dramatically improves the performance of state-of-the art algorithms. We also propose, for the ﬁrst time, methods for learning graph matching in both supervised and unsupervised fashions. Furthermore, we study the connections between graph matching and the MAP inference problem in graphical models, for which we propose novel inference and learning algorithms. In the last part of the thesis we present an application of our matching algorithm to the problem of object category recognition, and a novel algorithm for grouping image pixels/features that can be effectively used for object category segmentation.

Link: WWW, PDF

Monday, September 14, 2009

Talk: VASC Seminar: Jason Saragih Face Alignment through Subspace Constrained Mean-Shifts

Title:Face Alignment through Subspace Constrained Mean-Shifts

Author:Jason Saragih Post-Doc, Robotics, CMU

September 14, 2009, 2:30pm-3:00pm, NSH 3305

Abstract

Deformable model fitting has been actively pursued in the computer vision community for over a decade. As a result, numerous approaches have been proposed with varying degrees of success. A class of approaches that has shown substantial promise is one that makes independent predictions regarding locations of the model’s landmarks, which are combined by enforcing a prior over their joint motion. A common theme in innovations to this approach is the replacement of the distribution of probable landmark locations, obtained from each local detector, with simpler parametric forms. This simplification substitutes the true objective with a smoothed version of itself, reducing sensitivity to local minima and outlying detections. In this work, a principled optimization strategy is proposed where a nonparametric representation of the landmark distributions is maximized within a hierarchy of smoothed estimates. The resulting update equations are reminiscent of mean-shift but with a subspace constraint placed on the shape’s variability. This approach is shown to outperform other existing methods on the task of generic face fitting.

Speaker Biography

Jason Saragih joined the Robotics Institute as a Post-doctoral Fellow in 2008. He received both his BEng and PhD from the Australian National University in 2004 and 2008 respectively. His research interests concern the modeling and registration of deformable models.

Sunday, September 13, 2009

CMU PhD Thesis Proposal: Robust Monocular Vision-based Navigation for a Miniature Fixed-Wing Aircraft

PhD Thesis Proposal
Myung Hwangbo
Carnegie Mellon University

September 15, 2009, 1:00 p.m., Newell Simon Hall 1109

Title:
Robust Monocular Vision-based Navigation for a Miniature Fixed-Wing Aircraft

Abstract:
Recently the operation of unmanned aerial vehicles (UAVs) has expanded from military to civilian applications. Contrary to remote-controlled tasks in a high altitude, low-altitude flight in an urban environment requires a higher level of autonomy to respond to complex and unpredictable situations. Vision-based methods for autonomous navigation have been a promising approach because of multi-layered information delivered by images but their robustness in various situations has been hard to achieve. We propose a series of monocular computer vision algorithms combined with vehicle dynamics and other navigational sensors in GPS-denied environments like an urban canyon. We use a fixed-wing model airplane of 1m wing span as our UAV platform. Because of its small payload and limited communication bandwidth to off-body processors, particular attention is paid to both realtime and robustness at every level of vision processing of low-grade images.

In point-to-point navigation, state estimation is based on the structure-from-motion method (SFM) using natural landmarks under conditions where the captured images have sufficient texture. To cope with the fundamental limits of monocular visual odometry (scale ambiguity and rotation-translation ambiguity), vehicle dynamics and airspeed measurements are incorporated in a Kalman filter framework. More robust estimation is provided from multiple rails of the SFM which are traced in an interweaving fashion. Sturdy input to the SFM is enabled by optical flow computation which is tightly coupled with the IMU. Predictive warping parameters and a high-order motion model enhance the accuracy and life span of KLT feature tracking. We also employ vision-based horizon detection as an absolute attitude sensor which is useful for low-level control of a UAV.

The performance of the proposed method is evaluated in what we call an air-slalom task, where the UAV is expected to pass through multiple gates in the air in a row. It will demonstrate how a fixed-wing UAV confronts its limited agility, which is inferior to other hovercraft types in typical urban operations. To efficiently find a feasible obstacle-free path to a goal, we propose a 3D Dubins heuristic for optimal cost to a goal and use a set of lateral and longitudinal motion primitives interconnecting at trim states in order to reduce the dimension of configuration space. We first demonstrate our visual navigation in our UAV simulator, which can be switched between live and synthetic modes, each including wireless data transmission to a ground station.
[full PDF]

Thesis committee:
Takeo Kanade, Co-chair
James Kuffner, Co-chair
Sanjiv Singh
Omead Amidi
Randy Beard, Brigham Young University

Friday, September 11, 2009

Lab Meeting 09/23, 2009 (Kuo-Huei): Detecting Unusual Activity in Video (CVPR 2004)

Title: Detecting Unusual Activity in Video

Authors: Hua Zhong, Jianbo Shi and Mirko Visontai

Abstract:
We present an unsupervised technique for detecting unusual activity in a large video set using many simple features. No complex activity models and no supervised feature selections are used. We divide the video into equal length segments and classify the extracted features into prototypes, from which a prototype–segment co-occurrence matrix is computed. Motivated by a similar problem in document keyword analysis, we seek a correspondence relationship between prototypes and video segments which satisfies the transitive closure constraint. We show that an important sub-family of correspondence functions can be reduced to co-embedding prototypes and segments to N-D Euclidean space.We prove that an efficient, globally optimal algorithm exists for the co-embedding problem. Experiments on various real-life videos have validated our approach.

Link

CMU talk: Building Vision Systems for Moving Platforms: Background Subtraction from Freely Moving Cameras

Building Vision Systems for Moving Platforms: Background Subtraction from Freely Moving Cameras

Yaser Sheikh
Assistant Research Professor, Robotics, CMU

September 14, 2009, 2:00pm-2:30pm, NSH 3305

Abstract
Most video analysis systems assume staring cameras that continuously view the same scene from the same point of view. Increasingly, as cameras and computers are becoming smaller and cheaper, freely moving cameras are emerging as a primary platform for computer vision research. Background subtraction algorithms, a mainstay in most computer vision systems, define the background as parts of a scene that are at rest. Traditionally, these algorithms assume a stationary camera, and identify moving objects by detecting areas in a video that change over time. In this talk, I will present ideas to extend the concept of ‘subtracting’ areas at rest to apply to video captured from a freely moving camera. We do not assume that the background is well-approximated by a plane or that the camera center remains stationary during motion. The method operates entirely using 2D image measurements without requiring an explicit 3D reconstruction of the scene.

Speaker Biography
Yaser Sheikh is an Assistant Research Professor at the Robotics Institute and Adjunct Professor at the Department of Mechnical Engineering at Carnegie Mellon University. His research is in understanding dynamic scenes through computer vision, including human activity analysis, dynamic scene reconstruction, mobile camera networks, and nonrigid motion estimation. He obtained his doctoral degree from the University of Central Florida in 2006 and is a recipient of the Hillman award for excellence in computer science research.

Tuesday, September 08, 2009

Lab Meeting 9 / 16, 2009 (Alan): Towards Total Scene Understanding:Classification, Annotation and Segmentation in an Automatic Framework (CVPR 2009)

Title: Towards Total Scene Understanding:Classification, Annotation and Segmentation in an Automatic Framework (CVPR 2009)

Authors: Li-Jia Li, Richard Socher, Li Fei-Fei

Abstract:

Given an image, we propose a hierarchical generative model that classifies the overall scene, recognizes and segments each object component, as well as annotates the image with a list of tags. To our knowledge, this is the first model that performs all three tasks in one coherent framework. For instance, a scene of a ‘polo game’ consists of several visual objects such as ‘human’, ‘horse’, ‘grass’, etc. In addition, it can be further annotated with a list of more abstract (e.g. ‘dusk’) or visually less salient (e.g. ‘saddle’) tags. Our generative model jointly explains images through a visual model and a textual model. Visually relevant objects are represented by regions and patches, while visually irrelevant textual annotations are influenced directly by the overall scene class. We propose a fully automatic learning framework that is able to learn robust scene models from noisy web data such as images and user tags from Flickr.com. We demonstrate the effectiveness of our framework by automatically classifying, annotating and segmenting images from eight classes depicting sport scenes. In all three tasks, our model significantly outperforms stateof-the-art algorithms.

Link: local copy

Paper: Regression-Based Online Situation Recognition for Vehicular Traffic Scenarios

By D. Meyer-Delius, J. Sturm, W. Burgard.

In Proc. of the International Conference on Intelligent Robot Systems (IROS'09), St. Louis, USA, 2009.

Abstract—In this paper, we present an approach for learning generalized models for traffic situations. We formulate the problem using a dynamic Bayesian network (DBN) from which we learn the characteristic dynamics of a situation from labeled trajectories using kernel regression. For a new and unlabeled trajectory, we can then infer the corresponding situation by evaluating the data likelihood for the individual situation models. In experiments carried out on laser range data gathered on a car in real traffic and in simulation, we show that we can robustly recognize different traffic situations even from trajectories corresponding to partial situation instances.

[Full PDF]

Saturday, September 05, 2009

CMU PhD Oral: Learning in Modular Systems

CMU PhD Thesis Defense:
David M. Bradley

Title: Learning in Modular Systems

Abstract: Complex robotics systems are often built as a system of modules, where each module solves a separate data processing task to produce the complex overall behavior that is required of the robot. For instance, the perception system for autonomous off-road navigation discussed in this thesis uses a terrain classification module, a ground-plane estimation module, and a path-planning module among others. Splitting a complex task into a series of sub-problems allows human designers to engineer solutions for each sub-problem independently, and devise efficient specialized algorithms to solve them. However, modular design can also create problems for applying learning algorithms. Ideally, learning should find parameters for each module that optimize the performance of the overall system. This requires obtaining ``local'' information for each module about how changing the parameters of that module will impact the output of the system.

Previous work in modular learning showed that if the modules of system were differentiable, gradient descent could be used to provide this local information in “shallow” systems containing with two or three modules between input and output. However, except for convolutional neural networks, this procedure was rarely successful in “deep” systems of more than three modules. Many robotics applications added an additional complication by employing a planning algorithm to produce their output. This makes it hard to define a “loss” function to judge how well the system is performing, or compute a gradient with respect to previous modules in the system.

Recent advances in learning deep neural networks suggest that learning in deep systems can be successful if data-dependent regularization is first used to provide relevant local information to the modules of the system, and the modules are then jointly optimized by gradient descent. Concurrently, research in imitation learning has offered effective new ways of defining loss functions for the output of planning modules.

This thesis combines these lines of research to develop new tools for learning in modular systems. As data-dependent regularization has been shown to be critical to success in deep modular systems, several significant contributions are provided in this area. A novel, differentiable formulation of sparse coding is presented and shown to be a powerful semi-supervised learning algorithm. Sparse coding has traditionally used non-convex optimization methods, and an alternative, convex formulation is developed with a deterministic optimization procedure. Theoretical contributions developed for this convex formulation also enable an efficient, online multi-task learning algorithm. Results in domain adaptation provide further regularization options. To allow joint optimization of systems that employ planning modules, this thesis leverages loss functions developed in recent imitation learning research, and develops techniques for improving all modules of the system with subgradient descent. Finally, this thesis has also made significant contributions to mobile robot perception for navigation, providing terrain classification techniques that been incorporated into fielded industrial and government systems. [Full PDF]

Thesis Committee Members:
James A. Bagnell, Chair
Martial Hebert
Fernando De la Torre
Yoshua Bengio, University of Montreal

Thursday, September 03, 2009

CMU Computers and Thought Award Lecture:

How Optimized Environmental Sensing Helps Address Information Overload on the Web

Carlos Guestrin
Finmeccanica Associate Professor
Machine Learning and Computer Science Departments
Carnegie Mellon University

In this talk, we tackle a fundamental problem that arises when using sensors to monitor the ecological condition of rivers and lakes, the network of pipes that bring water to our taps, or the activities of an elderly individual when sitting on a chair: Where should we place the sensors in order to make effective and robust predictions? Such sensing problems are typically NP-hard, and in the past, heuristics without theoretical guarantees about the solution quality have often been used. In this talk, we present algorithms which efficiently find provably near-optimal solutions to large, complex sensing problems. Our algorithms are based on the key insight that many important sensing problems exhibit submodularity, an intuitive diminishing returns property: Adding a sensor helps more the fewer sensors we have placed so far. In addition to identifying most informative locations for placing sensors, our algorithms can handle settings, where sensor nodes need to be able to reliably communicate over lossy links, where mobile robots are used for collecting data or where solutions need to be robust against adversaries and sensor failures. We present results applying our algorithms to several real-world sensing tasks, including environmental monitoring using robotic sensors, activity recognition using a built sensing chair, and a sensor placement competition. We conclude with drawing an interesting connection between sensor placement for water monitoring and addressing the challenges of information overload on the web. As examples of this connection, we address the problem of selecting blogs to read in order to learn about the biggest stories discussed on the web, and personalizing content to turn down the noise in the blogosphere.

Bio: Carlos Guestrin is the Finmeccanica Associate Professor in the Machine Learning and in the Computer Science Departments at Carnegie Mellon University. Previously, he was a senior researcher at the Intel Research Lab in Berkeley. Carlos received his PhD in Computer Science from Stanford University and a Mechatronics Engineer degree from the University of Sao Paulo, Brazil. Carlos' work received awards at a number of conferences and journals. He is also a recipient of the ONR Young Investigator Award, the NSF Career Award, the Alfred P. Sloan Fellowship, and the IBM Faculty Fellowship. He was named one of the 2008 `Brilliant 10' by Popular Science Magazine, received the IJCAI Computers and Thought Award, and the Presidential Early Career Award for Scientists and Engineers (PECASE). Carlos is currently a member of the Information Sciences and Technology (ISAT) advisory group for DARPA.