Friday, March 30, 2007

News: Shoulder-worn camera acts as a third eye news service , Tom Simonite

A shoulder-mounted camera system that automatically tracks head movements and can recognise hand gestures has been developed by UK researchers. Eventually, they hope the system could identify a wearer's activity and offer assistance, for example by accessing a telephone directory when they reach for the phone.

The collar-mounted camera is worn on one shoulder. It is controlled wirelessly from a laptop computer, which uses the camera's output to keep track of objects, map its position and recognise different hand gestures made by the user. Walterio Mayol Cuevas, now a researcher at Bristol University, UK, created the camera while working at the University of Oxford.

Three separate motors make the camera highly directional, and even allow it to tilt, while inertia sensors are used to keep it pointed correctly while the wearer moves around

video: wearable camera, VSLAM
full article
One of the UK researcher: W.W. Mayol

News : Arms are dead giveaway for risky drivers

  • 31 March 2007
  • news service
  • Paul Marks
MONITORING a driver's gaze - as some luxury cars are now designed to do - may not be the best way to ensure they are paying attention to the road ahead. A more effective way to spot distracted drivers may be to monitor their activity by tracking head, arm and hand movements.

In the US, around 25 per cent of road accidents are caused by inattentive drivers, according to the National Highway Traffic Safety Administration. To encourage drivers to keep their eyes on the road, cars such as the Lexus 450 are now fitted with twin cameras mounted behind the steering wheel to monitor the direction of their gaze. If they look away from the road for too long an alarm sounds.

But in a paper to be published in the journal Transportation Research C (, engineers at the University of Minnesota, Minneapolis, say eye gaze is a poor measure because people can be distracted without taking their eyes off the road. "People don't always look down at the dashboard to adjust the aircon or the music system. And you can talk on a cellphone while still looking ahead," says researcher Harini Veeraraghavan. Gaze detectors can also be fooled by sunglasses, says his supervisor Nikos Papanikolopoulos.

Instead, the researchers have developed a system that looks at a driver's overall actions and then classifies them as safe or unsafe, sounding an alarm if necessary. An intelligent camera continually monitors the driver and tracks what they are doing with their head, arms and hands by picking out their skin tone from the background. This data is fed into software that has learned from thousands of examples which combinations of head, arm and hand movements could be risky - for example, holding a hand to the face for long periods to eat or talk on a cellphone.

In tests, the system has shown promise, although it needs refining. Clothing with skin-coloured patches can fool the software, for instance. Sebastian Enders, a researcher in driver distraction at the German in-car entertainment firm Blaupunkt, says the approach is interesting. "Their challenge will be proving that the system can reliably detect risky behaviour time after time," he says.

From issue 2597 of New Scientist magazine, 31 March 2007, page 28

CMU ML lunch: A unifying view of component analysis (from a computer vision perspective)

Speaker: Fernando De la Torre, CMU
Title: A unifying view of component analysis (from a computer vision perspective)
Date: April 02

Component Analysis (CA) methods (e.g. Kernel Principal Component Analysis, Independent Component Analysis, Tensor factorization) have been used as a feature extraction step for modeling, classification and clustering in numerous visual, graphics and signal processing tasks over the last four decades. CA techniques are especially appealing because many can be formulated as eigen-problems, offering great potential for efficient learning of linear and non-linear representations of the data without local minima.

In the first part of the talk, I will review standard CA techniques (Principal Component Analysis, Canonical Correlation Analysis, Linear Discrimiant Analysis, Non-negative Matrix Factorization, Independent Component Analysis) and three standard extensions (Kernel methods, latent variable models and tensors). In the second part of the talk, I will describe a unified framework for energy-based learning in CA methods. I will also propose several extensions of CA methods to learn linear and non-linear representations of data to improve performance, over the current use of CA features, in state-of-the-art algorithms for classification (e.g. support vector machines), clustering (e.g. spectral graph methods) and modeling/visual tracking (e.g. active appearance models) problems.

(FYI) Position at NASA Ames: Google Earth Content Developer


The Intelligent Robotics Group at the NASA Ames Research Center has an immediate opening for a full-time software engineer. We are looking for an developer with experience in computer vision to help publish NASA's images and 3D models of the Earth, Moon, and Mars to millions of people using Google Earth. The ideal candidate will have knowledge and experience with image processing, image stitching, and/or 3D vision.


* Work with research scientists and content specialists to create and publish a new NASA layer on Google Earth.
* Develop efficient, high-performance image processing pipelines.
* Implement rectification, alignment, and stitching methods for very large image datasets, including oblique angle imagery.
* Implement projection and mosaicing methods for very large 3D terrain models (Digital Elevation Models).

* B.S. (or higher) in Computer Science, with at least 2 years experience in team-based software development projects.
* At least 2 years experience with image processing/computer vision toolkits such as OpenCV.
* Significant development experience in C++ and UNIX environment (GNU tools, svn, doxygen).
* Knowledge of stereo vision and/or GIS is preferred.

NOTE: U.S. citizenship or current permanent resident status is REQUIRED.

The NASA Ames Intelligent Robotics Group ( builds systems to help humans explore and understand extreme environments, remote locations and uncharted worlds. IRG conducts applied research in a wide range of areas including: real-time 3D graphics, computer vision, gigapixel panoramic imaging, human-robot interaction, robot software architectures and planetary rovers. NASA Ames is located in the heart of Silicon Valley and is a world leader in information technology for space.

If you are interested in applying for this position, please send the following via email:
- a letter describing your background and motivation
- a detailed resume (text or PDF format)
- contact details for two (or more) references
to Dr. Terry Fong .

Wednesday, March 28, 2007

Lab Meeting 29 March 2007 (Jeff): RFID only SLAM Simulator

Briefly talking about the RFID SLAM simulation
1.Motion Model
2.RFID Sensor Model
3.Show the odometry only and SLAM simulation

Friday, March 23, 2007


I hang my head and sink into my chair dejectedly. As I slouch, the computer monitor in front of me tilts forward and drops low to almost touch the desk, mimicking my gloomy posture. When I perk up and straighten my back, the computer spots the change and the monitor cheerfully swings forward and upward.

Meet RoCo, the world's first expressive computer. Inhabiting a back room in MIT Media Lab, the robotic computer has a monitor for a head and a simple LCD screen for a face. It expresses itself using its double-jointed neck, which is equipped with actuators that shift the monitor up and down, tilt it forward and back and swivel it from side to side, rather like Pixar's animated lamp. An attached camera can detect when its user moves, allowing RoCo to adjust its posture accordingly.

RoCo's creators hope that by responding to a user's changes in posture, people might be more likely to build up a "rapport" with the computer that will make sitting at a desk all day a little more enjoyable. The MIT researchers also believe that by tuning into users' moods, the robot might help them get their work done more effectively.

See the full article and video
From issue 2596 of New Scientist magazine, 22 March 2007, page 30-31

CMU RI seminar:Exposing Digital Forgeries from Inconsistencies in Lighting

Hany Farid
Professor, Computer Science Department
Dartmouth College

With the advent of high-resolution digital cameras, powerful personal computers and sophisticated photo-editing software, the manipulation of digital images is becoming more common. To this end, we have been developing a suite of tools to detect tampering in digital images. I will discuss two related techniques for exposing forgeries from inconsistencies in lighting. In each case we show how to estimate the direction to a light source from only a single image: inconsistencies across the image are then used as evidence of tampering.
[Joint work with Kimo Johnson]

Thursday, March 22, 2007

Lab Meeting 22 March 2007 (Nelson): Statistical Segment-RANSAC Localization and Moving Object Detection in Crowded Urban Area

  • a new scheme for improving RANSAC ICP
  • detailed algorithms
  • working progress



The Giraffe Video-Conferencing Robot, a 5-foot-8 mobile video-conferencing system, can roam an office like a surrogate supervisor, according to the firm HeadThere. The device’s 14-inch screen and 2-megapixel camera can look up and down and turn in place to face speakers in a meeting. HeadThere says its robot can be lowered so that its "head" is at sitting height for face-to-face conversation. The system will be available in 2008 at a price between $1,800 and $3,000. To read more, visit: the link.


Software that reads thoughts and emotions through sensors and uses the data to control a computer has been developed by the San Francisco-based firm Emotiv, which says its technology reads both conscious and non-conscious brainwaves. Developer kits include a lightweight headset with multiple sensors and a wireless transmitter, Emotiv says, and its three software products can identify facial expressions, measure users’ emotional states, and allow them to use brainwaves to move or manipulate objects by thinking about an action, such as push, pull, lift, or rotate. Emotiv says it expects to have consumer products for gaming consoles and PC games based on its technology available in 2008. Read more: the link.



A research team at the University of Illinois at Urbana-Champaign has created a sensor that mimics the lateral line of some underwater species that helps them survive in underwater environments. This artificial lateral line, made for use with submarines and underwater vehicles, will detect and track moving underwater targets and avoid collisions with moving or stationary objects. The artificial lateral line consists of an integrated linear array of micro fabricated flow sensors, with the sizes of individual sensors and spacings between them matching those of their biological counterpart. For more information, visit: Link

CMU FRC talk: Particle RRT for Path Planning on Very Rough Terrain

Particle RRT for Path Planning on Very Rough Terrain

Nik Melchior. Ph.D. Candidate. Robotics Institute

Wednesday, March 28, 2007

Autonomous navigation algorithms operate on a model of the world built using information perceived by the robot. Despite quantifiable uncertainties in perception and modelling techniques, many navigation algorithms use coarsely quantified evaluations of areas of the world. Often, areas are classified as safe/unsafe or low cost/high cost, when much richer information is available to the planner.
This work introduces a path planning technique which explicitly models the uncertainty in the terrain and its properties. The method is an extension to the Rapidly-exploring Random Tree (RRT) algorithm, a well-known approach to path planning with kinodynamic constraints in high-dimensional state spaces.
Our extension, called Particle RRT (pRRT), uses multiple simulations to propagate the estimated uncertainty in perception to the planned path itself. This allows us to plan paths which are significantly more robust, and which can be followed with greater accuracy, even using open-loop control.

Lab Meeting 22 March 2007 (Bright): Continuum Crowds


Author: Adrien Treuille, Seth Cooper and Zoran Popovi´c


We present a real-time crowd model based on continuum dynamics.In our model, a dynamic potential field simultaneously integratesglobal navigation with moving obstacles such as other people, efficientlysolving for the motion of large crowds without the need forexplicit collision avoidance. Simulations created with our systemrun at interactive rates, demonstrate smooth flow under a varietyof conditions, and naturally exhibit emergent phenomena that havebeen observed in real crowds.


Wednesday, March 21, 2007

Lab Meeting 22 March 2007 (YuChun): Effects of Anticipatory Action on Human-Robot Teamwork

HRI 2007

Guy Hoffman and Cynthia Breazeal

A crucial skill for fluent action meshing in human team activity is a learned and calculated selection of anticipatory actions. We believe that the same holds for robotic teammates, if they are to perform in a similarly fluent manner with their human counterparts.
In this work, we propose an adaptive action selection mechanism for a robotic teammate, making anticipatory decisions based on the confidence of their validity and their relative risk. We predict an improvement in task efficiency and fluency compared to a purely reactive process.
We then present results from a study involving untrained human subjects working with a simulated version of a robot using our system. We show a significant improvement in best-case task efficiency when compared to a group of users working with a reactive agent, as well as a significant difference in the perceived commitment of the robot to the team and its contribution to the team’s fluency and success. By way of explanation, we propose a number of fluency metrics that differ significantly between the two study groups.


Lab Meeting 22 March 2007 (Jim): A visual bag of words method for interactive qualitative localization and mapping

A visual bag of words method for interactive qualitative localization and mapping
David Filliat, ICRA 2007

PDF, homepage

Localization for low cost humanoid or animal-like personal robots has to rely on cheap sensors and has to be robust to user manipulations of the robot. We present a visual localization and map-learning system that relies on vision only and that is able to incrementally learn to recognize the different rooms of an apartment from any robot position. This system is inspired by visual categorization algorithms called bag of words methods that we modified to make fully incremental and to allow a user-interactive training. Our system is able to reliably recognize the room in which the robot is after a short training time and is stable for long term use. Empirical validation on a real robot and on an image database acquired in real environments are presented.

Cited paper -- Visual Categorization with Bags of Keypoints.

Tuesday, March 20, 2007

Lab Meeting 22 Mar 2007 (Atwood): Fast, Integrated Person Tracking and Activity Recognition with Plan-View Templates from a Single Stereo Camera

Michael Harville, Hewlett-Packard Laboratories Center for Signal and Image Processing
Dalong Li, Georgia Inst. of Technology

Plan-view projection of real-time depth imagery can improve
the statistics of its intrinsic 3D data, and allows for
cleaner separation of occluding and closely-interacting people.
We build a probabilistic, real-time multi-person tracking
system upon a plan-view image substrate that well preserves
both shape and size information of foreground objects.
The tracking’s robustness derives in part from its “plan-view
template” person models, which capture detailed properties
of people’s body configurations. We demonstrate that these
same person models - obtained with a single compact stereo
camera unit - may also be used for fast recognition of body
pose and activity. Principal components analysis is used to
extract plan-view “eigenposes”, onto which person models,
extracted during tracking, are projected to produce a compact
representation of human body configuration. We then
formulate pose recognition as a classification problem, and
use support vector machines (SVMs) to quickly distinguish
between, for example, different directions people are facing,
and different body poses such as standing, sitting, bending
over, crouching, and reaching. The SVM outputs are transformed
to probabilities and integrated across time in a probabilistic
framework for real-time activity recognition.


Monday, March 19, 2007

[Robot Perception and Learning] Lab Meeting Day

Since Bob won't go to ARTC this Thursday, the lab meeting will be rescheduled back to 11 AM ~2PM, this Thursday.

Sunday, March 18, 2007

CFP: Microsoft Robotics Studio Soccer Challenge at RoboCup 2007 Atlanta

Microsoft Robotics Studio Soccer Challenge at RoboCup 2007 Atlanta

Call for Participants

Deadline: March 19, 2007

Through this call, RoboCup 2007 Atlanta requests a Letter of Intent from those teams interested in participating in a series of competitive soccer matches using the Microsoft Robotics Studio physics-based 3D simulation environment at RoboCup 2007 Atlanta.

Microsoft is creating a robot soccer simulation to demonstrate how Microsoft Robotics Studio can be applied to challenging environments like RoboCup soccer. While this contest will demonstrate simulation of a competitive event, Microsoft is also working with robot manufacturers developing hardware for RoboCup to demonstrate how easily player competition software can be easily transferred to actual robots.

Information about the event: Participating teams will use a special preview of Microsoft new soccer competition. The initial version of the competition will feature wheeled robots available for free download from the Microsoft Robotics Studio website: Teams can use this to begin initial development of their player software.

By May, one or more advanced simulated models (i.e. legged robots) are expected to be available for the soccer simulation. One of these simulated robots will be selected for the final competition and released to the community at that time. Teams can then migrate their code to the new robot in preparation for the challenge in July.

Recognition: Placing teams will be awarded trophies and will be recognized on the Microsoft and RoboCup 2007 Atlanta websites.

How to respond to this call: Responses should include the following information:

Team Name
Team Leader
University (or company) Name
Short description of your previous participation in RoboCup

Responses should be sent by email to

Friday, March 16, 2007

CMU vision talk: A Logical Theory for Detecting Humans in Surveillance Video (with an excursion into collaborative filtering)

A Logical Theory for Detecting Humans in Surveillance Video (with an excursion into collaborative filtering)

LARRY S. DAVIS, University of Maryland
> March 19, 2007
> 3:30 PM
> Hamerschlag Hall 1112
> Refreshments 3:15 PM

The capacity to robustly detect humans in video is a critical component of automated visual surveillance systems. This talk describes a bilattice based logical reasoning approach that exploits contextual information, and knowledge about interactions between humans, and augments it with the output of low level body part detectors for human detection. Detections from low level parts-based detectors are treated as logical facts and used to reason explicitly about the presence or absence of humans in the scene. Positive and negative information from different sources, as well as uncertainties from detections and logical rules, are integrated within the bilattice framework. This approach also generates proofs or justifications for each hypothesis it proposes. These justifications (or lack thereof) are further employed by the system to explain and validate, or reject potential hypotheses. This allows the system to explicitly reason about complex interactions between humans and handle occlusions. These proofs are also available to the end user as an explanation of why the system thinks a particular hypothesis is actually a human. We employ a boosted cascade of gradient histograms based detector to detect individual body parts. We have applied this framework to analyze the presence of humans in static images from different datasets.

I will also talk about the application of this framework to the problem of collaborative filtering. In these applications, there exist, potentially, a large number of cues that can contribute to making a final recommendation for a given user. The proposed bilattice based logical reasoning approach exploits these multiple, noisy, and potentially contradictory sources of information to predict movie preferences. We report results on the publicly available MovieLens dataset and compare our approach against a number of state-of-the-art ranking algorithms for collaborative filtering.

CMU Intelligence Seminar: Statistical Relational Learning: Entity Resolution and Link Prediction

Statistical Relational Learning: Entity Resolution and Link Prediction

Lise Getoor
University of Maryland - College Park

A key challenge for machine learning is mining richly structured datasets describing objects, their properties, and links among the objects. We'd like to be able to learn models which can capture both the underlying uncertainty and the logical relationships in the domain. Links among the objects may demonstrate certain patterns, which can be helpful for many practical inference tasks and are usually hard to capture with traditional statistical models. Recently there has been a surge of interest in this area, fueled in part by interest in mining social networks, web collections, security and law enforcement data and biological data.

Statistical Relational Learning (SRL) is a newly emerging research area which attempts to represent, reason and learn in domains with complex relational and rich probabilistic structure. In this talk, I'll begin with a short SRL overview. Then, I'll describe some of my group's recent work, focusing on our work on entity resolution and link prediction in relational domains.

Joint work with students: Indrajit Bhattacharya, Mustafa Bilgic, Rezarta Islamaj, Louis Licamele, Galileo Namata, Vivek Sehgal, Prithviraj Sen and Elena Zheleva.

Speaker Bio
Prof. Lise Getoor is an assistant professor in the Computer Science Department at the University of Maryland, College Park. She received her PhD from Stanford University in 2001. Her current work includes research on link mining, statistical relational learning and representing uncertainty in structured and semi-structured data. Her work in these areas has been supported by NSF, NGA, KDD, ARL and DARPA. In June 2006, she co-organized the fourth in a series of successful workshops on statistical relational learning, She has published numerous articles in machine learning, data mining, database and AI forums. She was one of 11 finalists choosen nationally for the 2005 Microsoft New Faculty Award. She is a member of AAAI Executive council, is on the editorial board of the Machine Learning Journal, is a JAIR associate editor and has served on numerous program committees including AAAI, ICML, IJCAI, KDD, SIGMOD, UAI, VLDB, and WWW.

Tuesday, March 13, 2007

Lab Meeting 15 Mar 2007 (Any): RFID-Based SLAM

Urban Search And Rescue (USAR) is a time critical task. Extraordinary circumstances after a real disaster make it very hard to apply common techniques. Firemen at 911 reported that they had major difficulties to orientate themselves after leaving collapsed buildings. The arbitrary structure of the environment and limited visibility conditions due to smoke, dust, and fire, prevent an easy distinction of different places. Therefore, it is proposed to solve the problem of data association by the active distribution and recognition of RFID tags. Furthermore, RFID tags can be utilized for a communication-free coordination of a robot team and used by human forces to store additional user data such as the number of victims located in a room or an indication of a hazardous area.

The work has been extensively tested on two different robot platforms, a 4WD (four wheeled drive) differentially steered robot for the autonomous team exploration of large office-like arenas, and a tracked robot for climbing 3D obstacles.

ICRA'07 Paper: RFID-Based Exploration for Large Robot Teams - [via] Link

Sunday, March 11, 2007

[MIT Thesis] Distributed Method Selection and Dispatching of Contingent, Temporally Flexible Plans

Block, Stephen

Many applications of autonomous agents require groups to work in tight coordination. To be dependable, these groups must plan, carry out and adapt their activities in a way that is robust to failure and to uncertainty. Previous work developed contingent, temporally flexible plans. These plans provide robustness to uncertain activity durations, through flexible timing constraints, and robustness to plan failure, through alternate approaches to achieving a task. Robust execution of contingent, temporally flexible plans consists of two phases. First, in the plan extraction phase, the executive chooses between the functionally redundant methods in the plan to select an execution sequence that satisfies the temporal bounds in the plan. Second, in the plan execution phase, the executive dispatches the plan, using the temporal flexibility to schedule activities dynamically. Previous contingent plan execution systems use a centralized architecture in which a single agent conducts planning for the entire group. This can result in a communication bottleneck at the time when plan activities are passed to the other agents for execution, and state information is returned. Likewise, a computation bottleneck may also occur because a single agent conducts all processing. This thesis introduces a robust, distributed executive for temporally flexible plans, called Distributed-Kirk, or D-Kirk. To execute a plan, D-Kirk first distributes the plan between the participating agents, by creating a hierarchical ad-hoc network and by mapping the plan onto this hierarchy. Second, the plan is reformulated using a distributed, parallel algorithm into a form amenable to fast dispatching. Finally, the plan is dispatched in a distributed fashion. We then extend the D-Kirk distributed executive to handle contingent plans. Contingent plans are encoded as Temporal Plan Networks (TPNs), which use a non-deterministic choice operator to compose temporally flexible plan fragments into a nested hierarchy of contingencies. A temporally consistent plan is extracted from the TPN using a distributed, parallel algorithm that exploits the structure of the TPN.At all stages of D-Kirk, the communication load is spread over all agents, thus eliminating the communication bottleneck. In particular, D-Kirk reduces the peak communication complexity of the plan execution phase by a factor of O(A/e'), where e' is the number of edges per node in the dispatchable plan, determined by the branching factor of the input plan, and A is the number of agents involved in executing the plan.In addition, the distributed algorithms employed by D-Kirk reduce the computational load on each agent and provide opportunities for parallel processing, thus increasing efficiency. In particular, D-Kirk reduces the average computational complexity of plan dispatching from O(eN^3) in the centralized case, to typical values of O(eN^2) per node and O(eN^3/A) per agent in the distributed case, where N is the number of nodes in the plan and e is the number of edges per node in the input plan.Both of the above results were confirmed empirically using a C++ implementation of D-Kirk on a set of parameterized input plans. The D-Kirk implementation was also tested in a realistic application where it was used to control a pair of robotic manipulators involved in a cooperative assembly task.


Friday, March 09, 2007

Computer Science and Artificail Intelligence Laboratory Technical Report

Sensitive Manipulation



In this approach, manipulation is mainly guided by tactile feedback as opposed to vision. The traditional approach of a highly precise arm and vision system controlled by a model-based architecture is replaced by one that uses a low mechanical impedance arm with dense tactile sensing and exploration capabilities run by a behavior-based architecture.

The robot OBRERO can come gently in contact, explore, lift, and place the object in a different location. It can also detect slippage and external forces acting on a object while it is held. This tack can be done with very light objects with no fixtures and on slippage surface.

[Robot Perception and Learning] Paper: Auditory Evidence Grids

Eric Martinson, Alan Schultz
Proc. of the International Conference on Intelligent Robots and Systems (IROS) 2006

Abstract – Sound source localization on a mobile robot can be a difficult task due to a variety of problems inherent to a real environment, including robot ego-noise, echoes, and the transient nature of ambient noise. As a result, source localization data are often very noisy and unreliable. In this work, we overcome some of these problems by combining the localization evidence over a variety of robot poses using an evidence grid. The result is a representation that localizes the pertinent objects well over time, can be used to filter poor localization results, and may also be useful for global re-localization from sound localization results.


Thursday, March 08, 2007

Lab Meeting 8 March 2007 (ZhenYu):3D Scene Reconstruction from Reflection Images in a Spherical Mirror

Title : 3D Scene Reconstruction from Reflection Images in a Spherical Mirror
Author : Kanbara, M. Ukita, N. Kidode, M. Yokoya, N.
Conference : ICPR2006

This paper proposes a method for reconstructing a 3D scene structure by using the images reflected in a spherical mirror. In our method, the mirror is moved freely within the field of view of a camera in order to observe a surrounding scene virtually from multiple viewpoints. The observation scheme, therefore, allows us to obtain the wide-angle multiviewpoint images of a wide area. In addition, the following characteristics of this observation enable multi-view stereo with simple calibration of the geometric configuration between the mirror and the camera; (1) the distance and direction from the camera to the mirror can be estimated directly from the position and size of the mirror in the captured image and (2) the directions of detected points from each position of the moving mirror can be also estimated based on reflection on a spherical surface. Some experimental results show the effectiveness of our 3D reconstruction method.


Wednesday, March 07, 2007



Extendable bumpers could be used by convoys of intelligent vehicles to maintain safe distances when normal communications fail, according to researchers, who say their simulations suggest that bumpers that expand pneumatically to touch the car ahead could help keep the convoy together safely in an emergency. Prototype automated control systems that allow intelligent vehicles to travel into tight bunches reduce congestion, save fuel and cut down on accidents, but if communications become disrupted, the situation could be too fast-moving and complex for human drivers, with dangerous consequences. A car equipped with sensors in an extendable bumper would allow the vehicle to determine what the one in front was doing and react quickly by measuring compression of the bumper to track the relative speed and acceleration of the car ahead. Read more: the link.

Lab meeting, March 08, 2007 (Vincent)

My talk will be consisted of the following components :

1. Fit a face through Active Appearance Model (AAM).
2. Get the frontal view of the given face.
3. Use eigenface or fisherface to recognize faces.

Lab Meeting 8 March 2007 (Stanley)

I will briefly introduce the planning and control method:
1. The ND method developed by Javier Minguez and Luis Montano and our modification.
2. The global planning module implementing on ITRI robot.

Lab Meeting 8 March 2007 (Casey): A Multiview Face Identification Model With No Geometric Constraints

Title: A Multiview Face Identification Model With No Geometric Constraints
Jerry Jun Yokono, Sony Intelligence Dynamics Laboratories, Inc., Japan
Tomaso Poggio, M.I.T.
Source: Proceedings of the 7th International Conference on Automatic Face and Gesture Recognition (FGR06)

Face identification systems relying on local descriptors areincreasingly used because of their perceived robustness withrespect to occlusions and to global geometrical deformations.Descriptors of this type – based on a set of oriented Gaussian derivative filters – are used in our identification system. In this paper, we explore a pose-invariant multiview face identification system that does not use explicit geometrical information. The basic idea of the approach is to find discriminant features to describe a face across different views. A boosting procedure is used to select features out of a large feature pool of local features collected from the positive training examples. We describe experiments on well-known, though small, face databases with excellent recognition rate.

Download: [link]

Saturday, March 03, 2007

CMU ML talk: Features, kernels, and similarity functions

Speaker: Prof. Avrim Blum, CMU

Title: Features, kernels, and similarity functions

Date: March 05

Abstract: Given a new learning problem, one of the first things you need to do is figure out what features you want to use. Alternatively, there has been substantial work on kernel functions which provide implicit feature spaces, but then how do you pick the right kernel? In this talk I will survey some theoretical results that can provide some help or at least guidance for these tasks. In particular, I will talk about:

* Algorithms designed to handle large feature spaces when it is expected that only a small number of the features will actually be useful (so you can pile a lot on when you don't know much about the domain).

* Kernel functions. Can theory provide some guidance into selecting or designing a kernel function in terms of natural properties of your domain?

* Combining the above. Can we use kernels to generate explicit features?

[in addition to the survey nature, part of this talk will sneak in some work joint with Nina Balcan]

Friday, March 02, 2007

Link for the videos

There are videos about the result of "Grasp Novel Objects":

Videos -> Detailed videos

Thursday, March 01, 2007

[Robot Perception and Learning] Lab Meeting Fri, 2 March 2007 : Robotic Grasping of Novel Objects

Ashutosh Saxena, Justin Driemeyer, Justin Kearns, Andrew Y.Ng
Computer Science Department
Stanford University, Stanford, CA 94305



We consider the problem of grasping novel objects, specifically ones that are being seen for the first time through vision. We present a learning algorithm that
neither requires, nor tries to build, a 3-d model of the object. Instead it predicts,
directly as a function of the images, a point at which to grasp the object. Our algorithm is trained via supervised learning, using synthetic images for the training set. We demonstrate on a robotic manipulation platform that this approach successfully grasps a wide variety of objects, such as wine glasses, duct tape, markers, atranslucentbox, jugs, knife-cutters, cellphones, keys, screwdrivers, staplers, toothbrushes, a thick coil of wire, a strangely shaped power horn, and others, none of which were seen in the training set.

Lab Meeting 2 March 2007 (Chihao): Audio Source detection and tracking - sensor model

Sensor model is an important part of tracking system.
I will show the microphone arrays model and the tracking results.

Lab Meeting 2 March 2007 (Jeff): RFID-Based Exploration for Large Robot Teams

To coordinate a team of robots for exploration is a challenging problem, particularly in largeareas as for example the devastated area after a disaster.This problem can generally be decomposed into task assignment and multi-robot path planning. In this paper, we address both problems jointly. This is possible because we reduce significantly the size of the search space by utilizing RFID tags as coordination points.
The exploration approach consists of two parts: a stand-alone distributed local search and a global monitoring process which can be used to restart the local search in more convenient locations. Our results show that the local exploration works for large robot teams, particularly if there are limited computational resources. Experiments with the global approach showed that the number of conflicts can be reduced, and that the global coordination mechanism increases significantly the explored area.


Lab Meeting 2 March 2007 (Eric): Projector and Camera Syatem - 3D Model Construction

I will show the result based on the point to surface scan-matching and discuss how it works.