Friday, September 26, 2008

Lab Meeting September 29, 2008(ZhenYu): Screen-Camera Calibration using a Spherical Mirror

Authors: Yannick Francken, Chris Hermans, Philippe Bekaert

Recent developments in the consumer market have indicated that the average user of a personal computer is likely to also own a webcam. With the emergence of this new user group will come a new set of applications, which will require a user-friendly way to calibrate the position of the camera with respect to the location of the screen.
This paper presents a fully automatic method to calibrate a screen-camera setup, using a single moving spherical mirror. Unlike other methods, our algorithm needs no user intervention other then moving around a spherical mirror. In addition, if the user provides the algorithm with the exact radius of the sphere in millimeters, the scale of the computed solution is uniquely defined.


CfP NIPS 2008 workshop "Learning over Empirical Hypothesis Spaces"

NIPS-2008 Workshop on "Learning over Empirical Hypothesis Spaces"

Call for Contributions
Whistler, BC, Canada
December 13, 2008

Important Dates:
- Deadline: October 31, 2008,
- Notification: November 7, 2008

Workshop Chairs:

. Maria-Florina Balcan
. Shai Ben-David
. Avrim Blum
. Kristiaan Pelckmans
. John Shawe-Taylor



This workshop aims at collecting theoretical insights in the design of data-dependent learning strategies. Specifically we are interested in how far learned prediction rules may be characterized in terms of the observations themselves. This amounts to capturing how well data can be used to construct structured hypothesis spaces for risk minimization strategies - termed empirical hypothesis spaces. Classical analysis of learning algorithms requires the user to define a proper hypothesis space before seeing the data. In practice however, one often decides on the proper learning strategy or the form of the prediction rules of interest after inspection of the data (see e.g. [5, 7]). This theoretical gap constitutes exactly the scope of this workshop. A main theme is then the extent to which prior knowledge or additional (unlabeled) samples can or should be used to improve learning curves. ...(read further - )

Tentative Program:

One day divided in to four sessions, two morning, two afternoon with coffee between. Each session would have one invited contributor talking for 45 mins followed by 15 mins discussion, except the first where there would be two 45 min tutorial presentations. The sessions would each have an additional part:
* Session 1 Tutorials by S. Ben-David and A. Blum;
* Session 2 Invited talk plus two contributed 15 mins presentations (posters to be shown in the afternoon);
* Session 3 Invited talk plus spotlight (2min) presentations for posters with poster session following during coffee break;
* Session 4 Invited talk followed by discussion aimed at identifying 10 key open questions.

Both John Langford and Csaba Szepesvari will present their take on the problem. The final program will be announced soon.

Call for contributions:

We solicit discussions and insights (controversial or otherwise) into any of the following topics:
Relations between the luckiness framework, compatibility functions and empirically defined regularization strategies in general.
Luckiness and compatibility can be seen as defining a prior in terms of the (unknown but fixed) distribution generating the data. To what extent can this approach be generalised while still ensuring effective learning?
Models of prior knowledge that capture both complexity and distribution dependence for powerful learning.
Theoretical analysis of the use of additional (empirical) side information in the form of unlabeled data or data labeled by related problems
Examples of proper or natural luckiness or compatibility functions in practical learning tasks. How could, for example, luckiness be defined in the context of collaborative filtering?
The effect of (empirical) preprocessing of the data not involving the labels as for example in PCA, other data-dependent transformations or cleaning, as well as using label information as for example in PLS or in feature selection and construction based on the training sample.
Empirically defined theoretical measures such as Rademacher complexity or sparsity coefficients and their relevance for analysing empirical hypothesis spaces.

This workshop is intended for researchers interested in the theoretical underpinnings of learning algorithms which do not comply to the standard learning theoretical assumptions.

Submissions should be in the form of a 2-page abstract (i) summarizing a formal result, (ii) a discussion of its relevance to the workshop and (iii) pointers to the relevant literature. The abstract can be supported by an additional paper (either published or technical report), that contain detailed proofs of any assertions. We especially encourage contributions which describe how to bring in results from other formal frameworks.

CFP: NIPS 2008 Workshop on Analyzing Graphs: Theory and Applications



Analyzing Graphs: Theory and Methods

a workshop in conjunction with

22nd Annual Conference on Neural Information Processing Systems
(NIPS 2008)

December 12, 2008 Whistler, BC, Canada

Deadline for Submissions: Friday, October 31, 2008
Notification of Decision: Friday, November 10, 2008



Recent research in machine learning and statistics has seen the proliferation of computational methods for analyzing graphs and networks. These methods support progress in many application areas, including the social sciences, biology, medicine, neuroscience, physics, finance, and economics.

This workshop will address statistical, methodological and computational issues that arise when modeling and analyzing graphs. The workshop aims to bring together researchers from applied disciplines such as sociology, economics, medicine and biology with researchers from mathematics, physics, statistics and computer
science. Different communities use diverse ideas and mathematical tools; our goal is to foster cross-disciplinary collaborations and intellectual exchange.

Presentations will include novel graph models, the application of established models to new domains, theoretical and computational issues, limitations of current graph methods and directions for future research.

Online Submissions:
We welcome the following types of papers:

1. Research papers that introduce new models or apply established models to novel domains,

2. Research papers that explore theoretical and computational issues, or

3. Position papers that discuss shortcomings and desiderata of current approaches, or propose new directions for future research.

All submissions will be peer-reviewed; exceptional work will be considered for oral presentation. We encourage authors to emphasize the role of learning and its relevance to the application domains at hand. In addition, we hope to identify current successes in the area, and will therefore consider papers that apply previously proposed models to novel domains and data sets.

Submissions should be 4-to-8 pages long, and adhere to NIPS format ( Please email your submissions to:

Deadline for Submissions: Friday, October 31 2008
Notification of Decision: Friday, November 10 2008


This is a one-day workshop. The program will feature invited talks, poster sessions, poster spotlights, and a panel discussion. All submissions will be peer-reviewed; exceptional work will be considered for oral presentation.


Accepted papers will be distributed on a CD and made available for download. We are negotiating the publication of the accepted papers in print form.

Edo Airoldi, Princeton University,
David Blei, Princeton University,
Jake Hofman, Yahoo! Research,
Tony Jebara, Columbia University,
Eric Xing, Carnegie Mellon University,

Program Committee
David Banks (Duke University)
Peter Bearman (Columbia University)
Joseph Blitzstein (Harvard University)
Kathleen Carley (Carnegie Mellon University)
Aaron Clauset (Santa Fe Institute)
William Cohen (Carnegie Mellon University)
Stephen Fienberg (Carnegie Mellon University)
Lise Getoor (University of Maryland)
Peter Hoff (University of Washington)
Eric Horvitz (Microsoft Research)
Alan Karr (National Institute of Statistical Sciences)
Jure Leskovec (Carnegie Mellon University)
Kevin Murphy (University of British Columbia)
Eugene Stanley (Boston University)
Lyle Ungar (Universitoy of Pennsylvania)
Chris Wiggins (Columbia University)

Lab Meeting September 29, 2008(Chung-Han): A Mobile Vision System for Robust Multi-Person Tracking

Andreas Ess, Bastian Leibe, Konrad Schindler, Luc Van Gool
in IEEE Conference on Computer Vision and Pattern Recognition (CVPR'08)

We present a mobile vision system for multi-person tracking in busy environments. Specifically, the system integrates continuous visual odometry computation with tracking-by-detection in order to track pedestrians in spite of frequent occlusions and egomotion of the camera rig. To achieve reliable performance under real-world conditions, it has long been advocated to extract and combine as much visual in-formation as possible. We propose a way to closely integrate the vision modules for visual odometry, pedestrian detection, depth estimation, and tracking. The integration naturally leads to several cognitive feedback loops between the modules. Among others, we propose a novel feedback connection from the object detector to visual odometry which utilizes the semantic knowledge of detection to stabilize localization. Feedback loops always carry the danger that erroneous feedback from one module is amplified and causes the entire system to become instable. We therefore incorporate automatic failure detection and recovery, allowing the system to continue when a module becomes unreliable.The approach is experimentally evaluated on several long and difficult video sequences from busy inner-city locations.Our results show that the proposed integration makes it possible to deliver stable tracking performance in scenes of previously infeasible complexity.

Full Text : Link

Thursday, September 25, 2008

Lab Meeting September 29, 2008 (Any): SCAPE: Shape Completion and Animation of People

D. Anguelov, P.Srinivasan, D.Koller, S.Thrun, J. Rodgers, J.Davis. SCAPE: Shape Completion and Animation of People. Proceedings of the SIGGRAPH Conference, 2005.

Abstract—We introduce the SCAPE method (Shape Completion and Animation for PEople)—a data-driven method for building a human shape model that spans variation in both subject shape and pose. The method is based on a representation that incorporates both articulated and non-rigid deformations. We learn a pose deformation model that derives the non-rigid surface deformation as a function of the pose of the articulated skeleton. We also learn a separate model of variation based on body shape. Our two models can be combined to produce 3D surface models with realistic muscle deformation for different people in different poses, when neither appear in the training set. We show how the model can be used for shape completion — generating a complete surface mesh given a limited set of markers specifying the target shape. We present applications of shape completion to partial view completion and motion capture animation. In particular, our method is capable of constructing a high-quality animated surface model of a moving person, with realistic muscle deformation, using just a single static scan and a marker motion capture sequence of the person.

Monday, September 22, 2008

Lab Meeting September 22, 2008 (Yu-chun): Human Adaptation to a Miniature Robot: Precursors of Mutual Adaptation

Yasser Mohammad and Toyoaki Nishida
Robot and Human Interactive Communication (IEEE Ro-Man 2008)

Mutual adaptation is an important phenomenon in human-human communications. Traditionally HRI research was more interested in investigating adaptation of the robot to the human using machine learning techniques but the possibility of utilizing the natural ability of humans to adapt to other humans and artifacts including robots is recently becoming increasingly attractive. This paper presents some of the results from an experiment conducted to investigate the interaction patterns and effectiveness of motion cues as a feedback modality between a human operator and a miniature robot in a confined collaborative navigation task. The results presented in this paper show evidence of human adaptation to the robot and moreover suggest that the adaptation rate is not constant or continuous in time but is discontinuous and nonlinear. The results also show evidence of a starting exploration stage before the adaptation with duration dependent on the expectations of the human regarding the capabilities of the robot in the given task. The paper investigates how to utilize these and related findings for building robots not only capable of adapting to human operators but can also help those operators adapt to them.

Sunday, September 21, 2008

Lab Meeting September 22,2008(Shao-Chen): Closing the Loop in Scene Interpretation

Title: Closing the Loop in Scene Interpretation
Authors: D. Hoiem, A. A. Efros, and M. Hebert.

Image understanding involves analyzing many different aspects of the scene. In this paper, we are concerned with how these tasks can be combined in a way that improves the performance of each of them. Inspired by Barrow and Tenenbaum, we present a flexible framework for interfacing scene analysis processes using intrinsic images. Each intrinsic image is a registered map describing one characteristic of the scene. We apply this framework to develop an integrated 3D scene understanding system with estimates of surface orientations, occlusion boundaries, objects, camera viewpoint, and relative depth. Our experiments on a set of 300 outdoor images demonstrate that these tasks reinforce each other, and we illustrate a coherent scene understanding with automatically reconstructed 3D models.


Saturday, September 20, 2008

CMU ECE talk:From Single Images To Camera Networks: Modeling and Inference Strategies

From Single Images To Camera Networks: Modeling and Inference Strategies

Amit Roy-Chowdhury
UC Riverside
Sep 19 2008

The complexity of vision systems can be represented along many parameters, one of them being the amount of data that is processed. On one end of this spectrum is a single image, while on the other end is a large camera network. In this talk, I will focus on these two ends of the spectrum, analyze their unique requirements and inter-relationships. In the first part, we will discuss mathematical models of image appearance. In my research, I have tried to address the question on how valid are some of the commonly used models, like linear, bilinear, multilinear, locally linear. Given the physical laws of object motion, surface properties and image formation, can we derive some of these models from first principles? We will see that, under certain mathematical assumptions, we can indeed derive some of these models and that this analysis provides new insights into problems of tracking and recognition. In the second part of the talk, I will discuss our current work on scene analysis in camera networks. I will first describe a multi-objective optimization framework that is able to hold tracks of multiple targets over space and time by adapting between delay and accuracy requirements. Then, I will describe our recent work on cooperative control of a camera network using game theory. The importance of a good understanding of the properties of single images in analyzing data over a camera network will be highlighted.

Amit K. Roy-Chowdhury is an Assistant Professor of Electrical Engineering and a Cooperating Faculty in the Dept. of Computer Science at the University of California, Riverside. He completed his PhD in 2002 from the University of Maryland, College Park, where he also worked as a Research Associate in 2003. Previous to that, he received his Masters in Systems Science and Automation from the Indian Institute of Science, Bangalore. His research interests are in the broad areas of image processing and analysis, computer vision, video communications and statistical methods for signal processing, pattern recognition and machine learning. His current research projects include network-centric scene analysis in camera networks, physics-based mathematical modeling of image appearance, activity modeling and recognition, face and gait recognition, biological video analysis and distributed video compression. Dr. Roy-Chowdhury has over seventy papers in peer-reviewed journals, conferences and edited books. He is an author of the book titled "Recognition of Humans and Their Activities Using Video". He is an Associate Editor of the IAPR journal Machine Vision and Applications and is regular reviewer of the major journals and conference proceedings in his area.

CMU talk: Kernelized Sorting

Speaker: Le Song
Title: Kernelized Sorting
Date: Monday September 22

Matching pairs of objects is a fundamental operation of unsupervised learning. For instance, we might want to match a photo with a textual description of a person. In those cases it is desirable to have a compatibility function which determines how one set may be translated into the other. For many such instances we may be able to design a compatibility score based on prior knowledge or to observe one based on the co-occurrence of such objects.

In some cases, however, such a match may not exist or it may not be given to us beforehand. That is, while we may have a good understanding of two sources, we may not understand the mapping between the two spaces. For instance, we might have two collections of documents purportedly covering the same content, written in two different languages. Can we determine the correspondence between these two sets of documents without using a dictionary?

We will present a method which is able to perform such matching WITHOUT the need of a cross-domain similarity measure and we shall show that if such measures exist it generalizes normal sorting. Our method relies on the fact that one may estimate the dependence between sets of random variables even without knowing the cross-domain mapping. Various criteria are available. We choose the Hilbert Schmidt Independence Criterion between two sets and we maximize over the permutation group to find a good match. As a side-effect we obtain an explicit representation of the covariance.

We will demonstrate this kernelized sorting using various examples, including image layout, image matching, data attribute matching and multilingual document matching.

Thursday, September 18, 2008

Special Google/VASC Seminar:

Check this out if you want to see what Google is thinking about for computer vision research.

Jay Yagnik,
Head of Computer Vision and Audio Understanding Research
Google, Inc.

When: Wednesday, September 17, 12:00 p.m.

In most recognition / retrieval problems dealing with large image/video datasets users are often looking for searching / browsing around semantics of the data. Standard computer vision algorithms deal with features extracted from pixels and attempt to perform a mapping to predict semantics. Approaches looking at just pixels are inherently limited in this regard and give rise to what we call the "semantic gap", i.e the disconnect between the semantic concepts natural to users and the pixel based predictions. One possible solution here is to rely on the large collection of public web pages where we have images and surrounding text that is potentially relevant to the inherent semantics of the image. I'll present a special case of this class of solutions around learning to recognize people. Named entity recognition style text parsing can give us hints from the text about what phrases might be people names. Retaining all the possible associations between names and faces would give us a very weak training set. I'll talk about a machine learning formulation that we refer to as consistency learning, that can effectively train models from such weak training sets and use them for robust recognition. The training procedure is inherently parallel and scales to really large sets. We verify this by large scale experiments with more than 86M face models involved for more than 200K people.

Bio: Jay Yagnik is Head of Computer Vision and Audio Understanding Research at Google Inc. His interests include machine learning, scalable matching, graph information propagation, image representation and recognition, temporal information mining, statistics. He is an alumni of the Indian Institute of Science and Nirma Institute of Technology. Prior to Google he worked on criminal identification through beard-mustache invariant facial recognition, machine learning for predicting protein function and more at the Super Education and Research Center at IISc Bangalore.

Sunday, September 14, 2008

Lab Meeting September 15th, 2008 (swem): Face Alignment via Boosted Ranking Model

Title: Face Alignment via Boosted Ranking Model
Author: Hao Wu, Xiaoming Liu and Gianfranco Doretto


Face alignment seeks to deform a face model to match it with the features of the image of a face by optimizing an appropriate cost function. We propose a new face model that is aligned by maximizing a score function, which we learn from training data, and that we impose to be concave. We show that this problem can be reduced to learning a classifier that is able to say whether or not by switching from one alignment to a new one, the model is approaching the correct fitting. This relates to the ranking problem where a number of instances need to be ordered. For training the model, we propose to extend GentleBoost [23] to ranklearning. Extensive experimentation shows the superiority of this approach to other learning paradigms, and demonstrates that this model exceeds the alignment performance of the state-of-the-art.


Lab Meeting September 15th, 2008 (Alan): Pedestrian Detection in Crowed Scenes

Title: Pedestrian Detection in Crowed Scenes
Authors: B. Leibe, E. Seemann, and B. Schiele.
in IEEE Conference on Computer Vision and Pattern Recognition (CVPR'05), San Diego, CA, pp. 878-885, 2005

In this paper, we address the problem of detecting pedestrians in crowded real-world scenes with severe overlaps. Our basic premise is that this problem is too difficult for any type of model or feature alone. Instead, we present a novel algorithm that integrates evidence in multiple iterations and from different sources. The core part of our method is the combination of local and global cues via a probabilistic top-down segmentation. Altogether, this approach allows to examine and compare object hypotheses with high precision down to the pixel level. Qualitative and quantitative results on a large data set confirm that our method is able to reliably detect pedestrians in crowded scenes, even when they overlap and partially occlude each other. In addition, the flexible nature of our approach allows it to operate on very small training sets.

Full Text: Link

Sunday, September 07, 2008

IROS Workshop: Robotics Challenges for Machine Learning II

IROS Workshop: Robotics Challenges for Machine Learning II

Objectives and Topics
There is an increasing interest in machine learning and statistics within the robotics community. At the same time, there has been a growth in the learning community in using robots as motivating applications for new algorithms and formalisms. Rapid progress requires researchers from both disciplines to come together and agree on the challenges, problem formulations, and solution techniques. Specific themes of the workshop include:

  • learning models of robots, tasks or environments
  • learning plans and control policies by imitation and reinforcement learning
  • representations which facilitate learning, such as low-dimensional embeddings of movements
  • learning representations and task abstractions by unsupervised learning
  • probabilistic inference of task parameters from multi-modal sensory information
  • integration of learning into control architectures.

This workshop will also serve to kick-off the new IEEE Technical Committee (TC) on Robot Learning.

This workshop is directly related to what we are doing now. Check out the related papers.  -Bob

Saturday, September 06, 2008

CfP: Autonomous Robots - Special Issue on Robot Learning

Call For Papers: Autonomous Robots - Special Issue on Robot Learning
Quick Facts
Editors: Jan Peters, Max Planck Institute for Biological Cybernetics,
               Andrew Y. Ng, Stanford University
Journal: Autonomous Robots
Submission Deadline: November 8, 2008
Author Notification: March 1, 2009
Revised Manuscripts: June 1, 2009
Approximate Publication Date: 4th Quarter, 2009

Creating autonomous robots that can learn to act in unpredictable environments has been a long standing goal of robotics, artificial intelligence, and the cognitive sciences. In contrast, current commercially available industrial and service robots mostly execute fixed tasks and exhibit little adaptability. To bridge this gap, machine learning offers a myriad set of methods some of which have already been applied with great success to robotics problems. Machine learning is also likely play an increasingly important role in robotics as we take robots out of research labs and factory floors, into the unstructured environments inhabited by humans and into other
natural environments.

To carry out increasingly difficult and diverse sets of tasks, future robots will need to make proper use of perceptual stimuli such as vision, lidar, proprioceptive sensing and tactile feedback, and translate these into appropriate motor commands. In order to close this complex loop from perception to action, machine learning will be needed in various stages such as scene understanding, sensory-based action generation, high-level plan generation, and torque level motor control. Among the important problems hidden in these steps are robotic perception, perceptuo-action coupling, imitation learning, movement decomposition, probabilistic planning, motor primitive learning, reinforcement learning, model learning, motor control, and
many others.

Driven by high-profile competitions such as RoboCup and the DARPA Challenges, as well as the growing number of robot learning research programs funded by governments around the world (e.g., FP7-ICT, the euCognition initiative, DARPA Legged Locomotion and LAGR programs), interest in robot learning has reached an unprecedented high point. The interest in machine learning and statistics within robotics has increased substantially; and, robot applications have also become important for motivating new algorithms and formalisms in the machine learning community.

In this Autonomous Robots Special Issue on Robot Learning, we intend to outline recent successes in the application of domain-driven machine learning methods to robotics. Examples of topics of interest include, but are not limited to:

• learning models of robots, task or environments
• learning deep hierarchies or levels of representations from sensor & motor representations to task abstractions
• learning plans and control policies by imitation, apprenticeship and reinforcement learning
• finding low-dimensional embeddings of movement as implicit generative models
• integrating learning with control architectures
• methods for probabilistic inference from multi-modal sensory information (e.g., proprioceptive, tactile, vision)
• structured spatio-temporal representations designed for robot learning
• probabilistic inference in non-linear, non-Gaussian stochastic systems (e.g., for planning as well as for optimal or adaptive control)

From several recent workshops, it has become apparent that there is a significant body of novel work on these topics. The special issue will only focus on high quality articles based on sound theoretical development as well as evaluations on real robot systems.

Lab Meeting September 8th, 2008 (slyfox): Improving Localization Robustness in Monocular SLAM Using a High-Speed Camera

Title: Improving Localization Robustness in Monocular SLAM Using a High-Speed Camera

Authors: Peter Gemeiner, Andrew J. Davison, and Markus Vincze


In the robotics community localization and mapping of an unknown environment is a well-studied problem. To solve this problem in real-time using visual input, a standard
monocular Simultaneous Localization and Mapping (SLAM) algorithm can be used. This algorithm is very stable when smooth motion is expected, but in case of erratic or sudden movements, the camera pose typically gets lost. To improve robustness in Monocular SLAM (MonoSLAM) we propose to use a camera with faster readout speed to obtain a frame rate of 200Hz. We further present an extended MonoSLAM motion model, which can handle movements with significant jitter. In this work the improved localization and mapping have been evaluated against ground truth, which is reconstructed from off-line vision. To explain the benefits of using a high frame rate vision input in MonoSLAM framework, we performed epeatable experiments with a high-speed camera mounted onto a robotic arm. Due to the dense visual information MonoSLAM can faster shrink localization and mapping uncertainties and can operate under fast, erratic, or sudden movements. The extended motion model can provide additional robustness against significant handheld jitter when throwing or shaking the camera.

RSS2008 Paper

CMU RI Seminar: Engineering Self-Organizing Systems

Joint Intelligence Seminar ( /
Robotics Institute Seminar

September 12, 2008

Title: Engineering Self-Organizing Systems

Radhika Nagpal, Harvard University

Biological systems, from embryos to ant colonies, achieve tremendous mileage by using vast numbers of cheap and unreliable components to achieve complex goals reliably. We are rapidly building embedded systems with similar characteristics, from self-assembling modular robots to vast sensor networks. How do we engineer robust collective behavior?

In this talk, I will describe two projects from my group where we have used inspiration from nature, both cells and social insects, to design decentralized algorithms for programmable self-assembly. In the first project, we use insights from social insects to design algorithms for collective construction by simple mobile robots. In the second project we use insights from multicellular tissues to design a modular robot that can form complex environmentally-adaptive shapes. In both cases we can achieve "global-to-local compilation": the agents rely on simple and local interactions that provably self-organize a wide class of user-specified global goals. Finally, time permitting, I will show an example of "local-to-global" phenomena that happens in real tissue self-assembly.

Radhika Nagpal is an Assistant Professor of Computer Science at Harvard University since 2004. She received her PhD degree in Computer Science from MIT, and spent a year as a research fellow at Harvard Medical School. She is a recipient of the 2005 Microsoft New Faculty Fellowship award and the 2007 NSF Career award. Her research interests are biologically-inspired engineering principles for multi-agent systems and modelling multicellular biology.

Her student, Chih-Han Yu, gave a talk here on July 7, 2008. -Bob

CMU VASC Seminar: Metric Learning for Image Alignment and Classification

VASC Seminar
Monday, September 8, 2008

Metric Learning for Image Alignment and Classification
Minh Hoai Nguyen
Robotics Institute, Carnegie Mellon University


What constitutes good metrics to encode and compare? This talk will address this fundamental question that concerns computer vision scientists. We will show how to learn metrics that are optimal for image alignment with Active Appearance Models (AAMs), and image classification using Support Vector Machines (SVMs). Traditionally, feature extraction/selection and metric learning methods have been inferred independently of model estimation (e.g. SVM, AAM). Independently learning features and model parameters may result in the loss of information that is relevant to the alignment or classification process. Rather, we propose a convex framework for jointly learning image metrics and model parameters. To illustrate the benefits of our approach, this talk is divided in two parts. In the first part, we will discuss the problem of learning image metrics to avoid local minima in template alignment and AAMs. We learn a cost function that explicitly optimizes the occurrence of local minima at and only at the places corresponding to the correct alignment parameters. In the second part of the talk, we will consider the problem of building a fast classifier for facial feature detection. We will show how to jointly learn SVM parameters together with a subset of the pixels that are relevant for classification. This work is done in collaboration with Joan Perez and Fernando De la Torre.


Minh Hoai Nguyen received his B.E. in Software Engineering from University of New South Wales, Australia in 2005. He has been a Ph.D. student in Carnegie Mellon University's Robotics Institute since 2006 and is advised by Fernando de la Torre. His research interests are in the area of computer vision and machine learning, especially at the intersection of the two. He is particularly interested in using data-driven techniques to learn representations of images (e.g. pixel selection, non-linear pixel combination) that are optimal for classification, clustering, visual tracking, and modeling.

Swem: add this to your reading list. -Bob

Friday, September 05, 2008

Lab Meeting September 8th, 2008 (fish60): High Performance Outdoor Navigation from Overhead Data using Imitation Learning

David Silver, J. Andrew Bagnell, Anthony StentzRobotics Institute, Carnegie Mellon UniversityPittsburgh, Pennsylvania USA
Robotics Science and Systems, June, 2008

High performance, long-distance autonomous navigationis a central problem for field robotics. Recently, a class of machine learning techniques have been developed that rely upon expert human demonstration to develop a function mapping overhead data to traversal cost. In this work, we extend these methods to automate interpretation of overhead data. We address key challenges, including interpolation-based planners, non-linear approximation techniques, and imperfect expert demonstration, necessary to apply these methods for learning to search for effective terrain interpretations.


Thursday, September 04, 2008

CFP Abstract for "Experimental Design for Real-World Systems


Experimental Design for Real-World Systems

AAAI Spring 2009 Symposium, March 23-25, Palo Alto, CA

Submission deadline: October 3, 2008

As more artificial intelligence (AI) research is fielded in real-world applications, the evaluation of systems designed for human-machine interaction becomes critical. AI research often intersects with other areas of study, including human-robot interaction, human-computer interaction, assistive technology, and ethics. Designing experiments to test hypotheses at the intersections of multiple research fields can be incredibly challenging. Many commonalities and differences already exist in experimental design for real-world systems. For example, the fields of human-robot interaction and human-computer interaction are two fields that have both joint and discrete goals. They look to evaluate very different aspects of design, interface, and interaction. In some instances, these two fields can share aspects of experimental design, while, in others, the experimental design must be fundamentally different.

We will provide a forum for researchers from many disciplines to discuss experiment design and the evaluation of real-world systems. We invite researchers from all applicable fields of human-machine interaction. We also invite researchers from allied fields, such as psychology, anthropology, design, human-computer interaction, human-robot interaction, rehabilitation and clinical care, assistive technology, and other related disciplines.

This symposium will focus on a wide variety of topics that address the challenges of experiment design for real-world systems including:
* the design of system evaluations,
* successes and failures in system evaluations,
* survey design for user studies,
* understanding the role technology plays in society,
* ethics of human subject studies,
* evaluating the use of machines as interventions,
* the uses of quantitative and qualitative data,
* and other related topics.

Format and Submissions

We will have a mix of plenary speakers, short presentations, and break-out groups. We will also have a poster session. Short presentations and posters are invited to submit an abstract (< 3 pages) on experiments conducted during their research, focused on the experimental methodology, especially those with unusual and effective methodologies. Submission formatting details at Email submissions to

Important Dates

* Call for Papers Due: October 3, 2008
* Authors Notified: November 2008
* Camera Ready Due: January 9, 2009

Organizing Committee

David Feil-Seifer (USC), Heidy Maldonado (Stanford), Bilge Mutlu (CMU), Leila Takayama (Stanford), Katherine Tsui (UMass Lowell)

Program Committee

Jenny Burke (USF), Kerstin Dautenhahn (Hertfordshire), Gert Jan Gelderbloom (VILANS), Maja Mataric (USC), Aaron Steinfeld (CMU), Holly Yanco (UMass Lowell)

Lab Meeting September 8th, 2008 (Jeff):On Handling Uncertainty in the Fundamental Matrix for Scene and Motion Adaptive Pose Recovery

Title: On Handling Uncertainty in the Fundamental Matrix for Scene and Motion Adaptive Pose Recovery

Authors: Sreenivas R. Sukumar, Hamparsum Bozdogan, David L. Page, Andreas F. Koschan and Mongi A. Abidi


The estimation of the fundamental matrix is the key step in feature-based camera ego-motion estimation for applications in scene modeling and vehicle navigation. In this paper, we present a new method of analyzing and further reducing the risk in the fundamental matrix due to
the choice of a particular feature detector, the choice of the matching algorithm, the motion model, iterative hypothesis generation and verification paradigms. Our scheme makes use of model-selection theory to guide the switch to optimal methods for fundamental matrix
estimation within the hypothesis-and-test architecture. We demonstrate our proposed method for vision-based robot localization in large-scale environments where the environment is constantly changing and navigation within the environment is unpredictable.

CVPR2008 Paper

CMU Intelligence Seminar: Predicting Neural Representations of Word Meanings

Intelligence Seminar (

September 9, 2008

Title: Predicting Neural Representations of Word Meanings

Tom M. Mitchell
E. Fredkin Professor
Machine Learning Department
Carnegie Mellon University

How does the human brain represent meanings of words and pictures in terms of the neural activity observable through fMRI brain imaging? This talk will present our research using machine learning methods to study this question. One line of our research has involved training classifiers that identify which word a person is thinking about, based on the image of their fMRI brain activity. A more recent line involves developing a generative computational model capable of predicting the neural activity associated with arbitrary English words, including words for which we do not yet have brain image data. This computational model was trained using a combination of fMRI data associated with several dozen concrete nouns, together with statistics gathered from a trillion-word text corpus. Once trained, the model predicts fMRI activation for any other concrete noun appearing in the tera-word text corpus, with highly significant accuracies over the 60 nouns for which we currently have fMRI data.

This work is based on a collaboration with a number of researchers, including my primary collaborator Marcel Just.

Tom M. Mitchell is the E. Fredkin Professor and head of the Machine Learning Department at Carnegie Mellon University. Mitchell is a past President of the American Association of Artificial Intelligence (AAAI), past Chair of the American Association for the Advancement of Science (AAAS) section on Information, Computing and Communication, and a recent member of the US National Research Council's Computer Science and Telecommunications Board. His general research interests lie in machine learning, artificial intelligence, and cognitive neuroscience. Mitchell believes the field of machine learning will be the fastest growing branch of computer science during the 21st century.

Wednesday, September 03, 2008

CMU thesis defense: Dynamics of Large Networks

Title: Dynamics of Large Networks
Speaker: Jure Leskovec
Date: September 3, 2008

A basic premise behind the study of large networks is that interaction leads to complex collective behavior. In our work we found interesting and counterintuitive patterns for time evolving networks, which change some of the basic assumptions that were made in the past. We then develop network models, fit such models to real networks, and use them to generate realistic graphs or give formal explanations about their properties.

Another important aspect of our research is the study information diffusion and the spread of influence in a large person-to-person product recommendation network and its effect on purchases. We also model the propagation of information on the blogosphere, and propose algorithms to efficiently find influential nodes in the network.

A central topic of our thesis is also the analysis of large datasets as certain network properties only emerge and thus become visible when dealing with lots of data. We analyze the world's social and communication network of Microsoft Instant Messenger with 240 million people and 255 billion conversations. We also made interesting and counterintuitive observations about network community structure that suggest that only small network clusters exist, and that they merge and vanish as they grow.

To view a draft of the thesis see:

Christos Faloutsos
Avim Blum
Jon Lafferty
Jon Kleinberg

Monday, September 01, 2008

Lab Meeting September 1, 2008 (Casey): Prior Data and Kernel Conditional Random Fields for Obstacle Detection

Authors: Carlos Vallespi, Anthony (Tony) Stentz
Robotics: Science and Systems 2008

Abstract—We consider the task of training an obstacle detection(OD) system based on a monocular color camera usingminimal supervision. We train it to match the performance of asystem that uses a laser rangefinder to estimate the presenceof obstacles by size and shape. However, the lack of rangedata in the image cannot be compensated by the extraction oflocal features alone. Thus, we investigate contextual techniquesbased on Conditional Random Fields (CRFs) that can exploitthe global context of the image, and we compare them to aconventional learning approach. Furthermore, we describe aprocedure for introducing prior data in the OD system to increaseits performance in “familiar” terrains. Finally, we performexperiments using sequences of images taken from a vehicle for autonomous vehicle navigation applications.

[Paper Link]