Monday, February 27, 2006

My talk this week

Probabilistic Cooperative Localization and Mapping in Practice

Author: Ioannis Rekleitis, Gregory Dudek and Evangelos Milios

From: International Conference on Robotics and Automation, 2003.

In this paper we present a probabilistic framework for the reduction in the uncertainty of a moving robot pose during exploration by using a second robot to assist. A Monte Carlo Simulation technique (specifically, a Particle Filter) is employed in order to model and reduce the accumulated odometric error. Furthermore, we study the requirements to obtain an accurate yet timely pose estimate. A team of two robots is employed to explore an indoor environment in this paper, although several aspects of the approach have been extended to larger groups. The concept behind our exploration strategy has been presented previously and is based on having one robot carry a sensor that acts as a “robot tracker” to estimate the position of the other robot. By suitable use of the tracker as an appropriate motion-control mechanism we can sweep areas of free space between the stationary and the moving robot and generate an accurate graph-based description of the environment. This graph is used to guide the exploration process. Complete exploration without any overlaps is guaranteed as a result of the guidance provided by the dual graph of the spatial decomposition (triangulation) of the environment. We present experimental results from indoor experiments in our laboratory and from more complex simulated experiments.

Paper: Cooperative Localization and Multi-Robot Exploration

Related Materials: Particle Filter Tutorial for Mobile Robots

Nature: Efficient auditory coding

Evan C. Smith & Michael S. Lewicki, CMU
Nature 439, 978-982 (23 February 2006)

The auditory neural code must serve a wide range of auditory tasks that require great sensitivity in time and frequency and be effective over the diverse array of sounds present in natural acoustic environments. It has been suggested that sensory systems might have evolved highly efficient coding strategies to maximize the information conveyed to the brain while minimizing the required energy and neural resources. Here we show that, for natural sounds, the complete acoustic waveform can be represented efficiently with a nonlinear model based on a population spike code. In this model, idealized spikes encode the precise temporal positions and magnitudes of underlying acoustic features. We find that when the features are optimized for coding either natural sounds or speech, they show striking similarities to time-domain cochlear filter estimates, have a frequencybandwidth dependence similar to that of auditory nerve fibres, and yield significantly greater coding efficiency than conventional signal representations. These results indicate that the auditory code might approach an information theoretic optimum and that the acoustic structure of speech might be adapted to the coding capacity of the mammalian auditory system.

Sunday, February 26, 2006

MIT talk: Hierarchical Abstractions for Planning & Control of Robotic Swarms

Speaker: Calin Belta, Boston University
Date: Tuesday, February 28 2006
Host: Daniela Rus, MIT

Specifying, planning, and controlling the motion of large groups of mobile agents (swarms) are difficult problems that received a lot of attention in recent years. I will present some recent results on reducing the dimension and complexity of such problems by defining abstractions. First, I will focus on continuous abstractions, which are obtained by extracting a small set of essential features of a swarm that can be used for planning and control. Second, I will show how discrete abstractions can be used to construct a finite dimensional description of the problem. Third, I will present an example in which the above two types of abstractions are seamlessly linked into a hierarchical abstraction framework, in which high level swarm specifications given as temporal logic formulas over features of interest are automatically converted into provably correct robot control laws.

MIT talk: Medical Image Registration in Healthcare, Biomedical Research and Drug Discovery

Speaker: Daniel Rueckert , Imperial College London
Date: Tuesday, February 28 2006
Contact: Polina Golland, x38005,

Imaging technologies are developing at a rapid pace allowing for in-vivo 3D and 4D imaging of the anatomy and physiology in humans and animals. This is opening up unprecedented opportunities for research and clinical applications ranging from imaging for drug discovery and delivery, over imaging for diagnosis and therapy, to imaging for basic research such as brain mapping. In this talk we will focus on how computational techniques based on non-rigid image registration can be used to address the image analysis challenges in healthcare, biomedical research and drug discovery.

Saturday, February 25, 2006

CMU FRC talk: Online and Structured Learning Techniques for Outdoor Robotics

Speaker: Drew Bagnell, Research Scientist, Robotics Institute
Date: Thursday, March 2, 2006

This presentation is based on joint work with Nathan Ratliff, Boris Sofman, Ellie Lin, Nicolas Vandapel, and Anthony Stentz
Programming behaviors for outdoor mobile robot navigation is hard. Machine learning promises to alleviate this difficulty but existing techniques often fall short. For instance, it is often the case that some features that, while potentially powerful for improving navigation, prove difficult to profit from as they generalize poorly to novel situations. Overhead imagery data, for instance, has the potential to greatly enhance autonomous robot navigation in complex outdoor environments. In practice, reliable and effective automated interpretation of imagery from diverse terrain, environmental conditions, and sensor varieties proves challenging. I'll discuss online, probabilistic models to effectively learn to use these scope-limited features by leveraging other features that, while perhaps otherwise more limited, generalize reliably.
I'll also discuss work on mobile robot learning based on demonstrated trajectories. This is a natural and potentially powerful approach to teaching a system. Unfortunately, most existing techniques to learn based on demonstrated trajectories face at least two important difficulties. First, it very hard to get "negative examples", in this framework; we can't actually drive the robot off a cliff or into a boulder. Secondly, it is very difficult to acquire long-horizon and goal-directed behavior by imitating a trainer. I'll talk about a new approach that addresses both concerns. It learns to map features of the world into costs for a planner in such a way so that resulting optimal plans mimic the trainer's behavior. This approach is powerful, as the behavior that a designer wishes the planner to execute is often clear, while specifying costs that engender this behavior is often much more difficult.

CMU ML talk: Machine Learning in TAC SCM (Trading Agent Competition in Supply)

Speaker: Michael Benisch, COS, CMU.
Date: February 27
Supply chains aid the manufacturing of many complex goods. Traditionally, supply chains have been maintained by human negotiators through long-term, static contracts, despite uncertain and dynamic market conditions. However, there has been a recent growing interest, from both industry and academia, in the potential for automating more efficient supply chain processes. The TAC SCM (Trading Agent Comeptition in Supply Chain Management) scenario is an international competition that provides a research platform facilitating the application of new academic technologies to the problem of managing a dynamic supply chain. Since the inception of TAC SCM, machine learning has emerged an essential aspect of successful agent design. Many agents, such as Carnegie Mellon's 2005 entry, CMieux, utilize learning techniques to estimate market conditions, and model opponent behavior. In this talk, we will discuss some specific learning problems faced by these agents, including the problem of forecasting future demand, the problem of predicting auction closing prices, and the problem of approximating supply availability. We will also discuss various solutions developed by researchers to address them, including a new extension of M5 regression trees used by CMieux, called distribution trees.

CMU thesis proposal: Real-time Planning for Single Agents and Multi-agent Teams in Unknown and Dynamic Environments

David Ferguson, Robotics Institute, Carnegie Mellon University
3 Mar 2006

As autonomous agents make the transition from solving simple, well-behaved problems to being useful entities in the real world, they must deal with the added complexity and uncertainty inherent in real environments. In particular, agents navigating through the real world can be confronted with incomplete or imperfect information (e.g. when prior maps are absent or incomplete), large state spaces (e.g. for robots with several degrees of freedom or teams of robots), and dynamic elements (e.g. when there are humans or other agents in the environment). In this work, we propose to address the problem of path planning and replanning in both static and dynamic environments for which prior information may be incomplete or imperfect. We intend to develop a set of planning algorithms that will enable single agents and multi-agent teams to operate more effectively in a wider range of realistic scenarios.

A copy of the thesis proposal document can be found at

Thursday, February 23, 2006

What's New @ IEEE in Wireless, February 2006

Greater numbers of vehicles equipped for wireless networking present new security challenges due to the short contact times between different mobile nodes and the large size of the networks, according to researchers studying the issue. The German-funded Network on Wheels (NoW) project incorporates security considerations into network development. Researchers say those concerns include continuous system availability (a system is robust even in the presence of malicious or faulty nodes); privacy, including un-traceability of actions to a user and un-linkability of the actions of a node; and secure communication. Current work on NoW includes detecting attacks on the different parts of the system and estimating both their impact and probability, researchers say. Read more:

Wireless systems that locate trapped miners and send them text messages are being tested by the U.S. Mine Safety and Health Administration (MSHA), including one system which pinpoints the location of individual miners, according to researchers. One of the systems uses a transmitter worn by miners that sends out a signal unique to each individual, researchers say, while another device is a personal receiver that allows rescuers to send text messages to the miners. Both technologies operate on a network of wireless radio transmitters installed in the tunnels, and were developed by the Australian firm Mine Site Technologies. Read more:

Four groups funded by England's Wired and Wireless Networked Systems (WINES) program -- which studies the creation of massive-scale ubiquitous and pervasive computing environments -- are examined in this month's issue of IEEE Distributed Systems Online. TIME-EACM, a collaboration between the University of London and Birkbeck College, is studying how wired and wireless systems can improve traffic flow and congestion in urban areas. BiosensorNet, comprised of several teams from Imperial College London, hopes to improve the medical industry with state of the art wireless sensors implanted in the body. Cityware, a project including the University of Bath, Imperial College London, and University College London, is studying how new integrated information systems placed in architecture will affect peoples' relationships with their environment. Finally, NEMO, comprised of departments at Lancaster University, is looking at embedding sensors in everyday objects -- called smart artifacts -- in order to enable physical entities to capture and share their "experiences." A new round of WINES funding set to be unleashed next month. Read more: the link

PASCAL Visual Object Classes Recognition Challenge 2006

Subject: PASCAL Visual Object Classes Recognition Challenge 2006
Date: Fri, 17 Feb 2006 20:32:18 GMT
From: Andrew Zisserman

Dear All,

We are running a second PASCAL Visual Object Classes Recognition Challenge. This time there are more classes (ten), more challenging images, and the possibility of confusion between classes with similar visual appearance (cars/bus, bicycle/motorbike).

As before participants can recognize any or all of the classes, and there is a classficiation and a detection track.

The development kit (Matlab code for evaluation, and baseline algorithms) and training data is now available at:

where further details are given. The timetable of the challenge is included below.

It would be great if each of you or your groups could participate.

Best wishes,

Andrew Zisserman
Mark Everingham
Chris Williams
Luc Van Gool


* 14 Feb 2006 : Development kit (training and validation data plus evaluation software) made available.

* 31 March 2006: Test set made available

* 21 April 2006: DEADLINE for submission of results

* 7 May 2006: Half-day (afternoon) challenge workshop to be held in conjunction with ECCV06, Graz, Austria.

IEEE Career Alert: Tech Jobs Are Jumping

3. Start Up, Not at the Bottom

The latest trend in entry-level jobs is to avoid them altogether. More and more recent college graduates are heading start-up businesses, writes The Boston Globe. In fact, at high-powered schools like Harvard and Carnegie Mellon, thirty to forty percent of students create their own companies within five years of graduating. For students thinking of leaping to the top of their own corporate ladder, certain skills may come in handy. For one, they may have to learn to market themselves. More advice can be found at:

What's New @ IEEE in Signal Processing, February 2006


The latest issue of "IEEE Signal Processing Magazine" (v. 23, no. 1) includes a feature section on knowledge-based systems for adaptive radars. Topics covered include Knowledge-based systems for adaptive radar, cognitive radar, space-time adaptive processing as well as several others. The table of contents and abstracts for all articles are available online, where subscribers may also access the full text of all papers:

Also now online, the latest issue of "IEEE Signal Processing Letters" (v. 13, no. 3), covering signal modification for ADPCM based on analysis-by-synthesis framework, a new gradient search interpretation of super-exponential algorithms among other topics:

Many environmentalists and scientists believe the world's fish populations are shrinking, and new developments in signal processing technology seek to arm researchers with techniques that provide more accurate fish population data. Off the coast of Monterey, California, USA, a team of scientists demonstrated a new sonar technique to detect squid egg clusters in the ocean's depths. By towing a sidescan sonar with the California State University Seafloor Mapping Lab's research vessel, the team was able to conduct experiments that tested various ways to tune sound wave frequencies. After signals were drawn out, the sound data was translated into sonar images in the form of seafloor maps which displayed where egg clusters could be found, providing a portrayal of future populations. Meanwhile, researchers at the Massachusetts Institute of Technology have created a remote sensor system that allows scientists to monitor large fish populations over a 10,000-square-kilometer area. While old surveying methods provide a smaller amount of data with high-frequency sonar beams, this new system employs low-frequency sonar beams that can travel farther distances, bringing data back in sharper detail through less intense signals. Read more about these developments:

The Regional Calorimeter Trigger, the world's fastest image processor, can analyze a billion proton collisions per second, according to its developers at the University of Wisconsin-Madison, and will be used in the Large Hadron Collider (LHC) in Geneva, Switzerland, to capture traces of the subatomic Higgs-Boson. The US$6 million device is composed of integrated circuits on 300 parallel processing computer cards, researchers say, creating a massive image processor capable of analyzing one trillion bits of data per second. The Higgs-Boson is one of the particles researchers say is necessary to complete the standard model of physics, the evidence for which has been sought for 20 years. When protons crash in a collider the event lasts no more than two-billionths of second, according to researchers. Read more:

Wednesday, February 22, 2006

Fast Extrinsic Calibration of a Laser Rangefinder to a Camera

{Ranjith Unnikrishnan , Martial Hebert}

External calibration of a camera to a laser rangefinder is a common pre-requisiteon today’s multi-sensor mobile robot platforms. However, the process of doing sois relatively poorly documented and almost always time-consuming. This documentoutlines an easy and portable technique for external calibration of a camera to a laserrangefinder. It describes the usage of the Laser-Camera Calibration Toolbox (LCCT),a MatlabR -based graphical user interface that is meant to accompany this document andfacilitates the calibration procedure. We also summarize the math behind its development.


CMU VASC talk: Learning to Transform Time Series with a Few Examples

Ali Rahimi, Intel Lab Seattle
Monday, Feb 27, 2006

I describe a semi-supervised regression algorithm that learns to transform one time series into another time series given examples of the transformation. I apply this algorithm to tracking, where one transforms a time series of observations from sensors to a time series describing the pose of a target. Instead of defining and implementing such transformations for each tracking task separately, I suggest learning a memoryless transformations of time series from a few example input-output mappings. Our algorithm searches for a smooth function that fits the training examples and, when applied to the input time series, produces a time series that evolves according to assumed dynamics. The learning procedure is fast and lends itself to a closed-form solution. I relate this algorithm and its unsupervised extension to nonlinear system identification and manifold learning techniques. I demonstrate it on the tasks of tracking RFID tags from signal strength measurements, recovering the pose of rigid objects, deformable bodies, and articulated bodies from video sequences, and tracking a target in a completely uncalibrated network of sensors.
For these tasks, this algorithm requires significantly fewer examples compared to fully-supervised regression algorithms or semi-supervised learning algorithms that do not take the dynamics of the output time series into account.

Speaker Bio:
Ali Rahimi is interested in developing machine learning tools for solving difficult sensing problems. His focus is on example-based tracking, and efficient approximation methods for estimation. He received a PhD from the MIT Computer Science and AI Lab in 2005, a MS in Media Arts and Science from the MIT Media Lab, and a BS in Electrical Engineering and Computer Science from UC Berkeley.

Tuesday, February 21, 2006

The Boosting Approach to Machine Learning

The Boosting Approach to Machine Learning
An Overview

Robert E. Schapire
AT&T Labs - Research
Shannon Laboratory

Boosting is a general method for improving the accuracy of any given learning algorithm. Focusing primarily on the AdaBoost algorithm, this chapter overviews some of the recent work on boosting including analyses of AdaBoost’s training error and generalization error; boosting’s connection to game theory and linear programming; the relationship between boosting and logistic regression; extensions of AdaBoost for multiclass classification problems; methods of incorporating human knowledge into boosting; and experimental and applied work using boosting.

Here is the link

Monday, February 20, 2006

My talk this week (Casey)

My talk has below parts:
1.The related work: Robust Real-time Object Detection.(Author: Viola & Jones)
2.Detection approach of HandVu System
3.Tracking Approach of HandVu System

The information of this paper:

It is in IEEE Intl. Conference on Automatic Face and Gesture Recognition, May 2004.

Robust Hand Detection
Mathias K¨olsch and Matthew Turk
Department of Computer Science, University of California, Santa Barbara, CA

Vision-based hand gesture interfaces require fast and extremely
robust hand detection. Here, we study view-specic
hand posture detection with an object recognition method
recently proposed by Viola and Jones. Training with this
method is computationally very expensive, prohibiting the
evaluation of many hand appearances for their suitability
to detection. As one contribution of this paper, we present a
frequency analysis-based method for instantaneous estimation
of class separability, without the need for any training.
We built detectors for the most promising candidates, their
receiver operating characteristics conrming the estimates.
Next, we found that classication accuracy increases with
a more expressive feature type. As a third contribution, we
show that further optimization of training parameters yields
additional detection rate improvements. In summary, we
present a systematic approach to building an extremely robust
hand appearance detector, providing an important step
towards easily deployable and reliable vision-based hand
gesture interfaces.

And Below is the autor's Ph.D thesis, "Vision Based Hand Gesture Interfaces for Wearable Computing and Virtual Environments"
You can download these two paper and get the author's information from this link:

Thursday, February 16, 2006

What's New @ IEEE in Computing, February 2006

Departing from traditional approaches to promoting the development of pattern recognition software, researchers at Ohio State University have created a new method that tests machine vision algorithms to evaluate which algorithm is most successful for a given application. Using two databases, one consisting of objects such as apples and pears and another of faces with various expressions, the researchers found the tasks of sorting objects and identifying expressions to be distinct in such a way that an algorithm could be good at doing one but not the other. The end result allows for a faster, more efficient way to gather data from pattern recognition software, according to Aleix Martinez, assistant professor of electrical and computer engineering at Ohio State. This work may have an affect on research in areas as varied as neuroscience, genetics, and economics. Read more:

The IEEE Pervasive Computing Magazine has announced a call for papers for a special issue on intelligent transportation. Authors are asked to submit articles describing the application of pervasive computing technologies, systems, and applications to vehicles, roads, and other transportation systems. Also encouraged are articles that discuss the security, privacy, social, and human-related issues of intelligent transportation, and case studies of experiences with existing pervasive technologies in use in transportation. Deadline for submission is 31 May 2006. For details, visit:

Tuesday, February 14, 2006

MIT Report : Learning Semantic Scene Models by Trajectory Analysis


Xiaogang Wang, Kinh Tieu, Eric Grimson


In this paper, we describe an unsupervised learning framework to segment a scene into semantic regions and to build semantic scene models from long-term observations of moving objects in the scene. First, we introduce two novel similarity measures for comparing trajectories in far-field visual surveillance. The measures simultaneously compare the spatial distribution of trajectories and other attributes, such as velocity and object size, along the trajectories. They also provide a comparison confidence measure which indicates how well the measured image-based similarity approximates true physical similarity. We also introduce novel clustering algorithms which use both similarity and comparison confidence. Based on the proposed similarity measures and clustering methods, a framework to learn semantic scene models by trajectory analysis is developed. Trajectories are first clustered into vehicles and pedestrians, and then further grouped based on spatial and velocity distributions. Different trajectory clusters represent different activities. The geometric and statistical models of structures in the scene, such as roads, walk paths, source and sinks, are automatically learned from the trajectory clusters. Abnormal activities are detected using the semantic scene models. The system is robust to low-level tracking errors.


MIT talk: Object Class and Subclass Recognition Using Relational Object Models

Speaker: Aharon Bar-Hillel , Hebrew University
Date: Wednesday, February 15 2006
Host: Prof. Tomaso Poggio, M.I.T., McGovern Institute, BCS & CSAIL

Abstract: In the first part of the talk I will present a new learning method for object class recognition, combining a generative constellation model with a discriminative optimization technique. Specifically we use a 'star'-like Bayesian network model, but learn its parameters using an extended boosting technique which iterates between inference and part learning. Learning complexity is linear in the number of model parts and image features, compared to an exponential learning complexity for similar models in a generative framework. This allows the construction of rich models with many distinctive parts, leading to improved classification accuracy.

In the second part of the talk I will address the problem of sub-ordinate class recognition (like the distinction between cross and sport motorcycles), relying on the above-mentioned learning technique. Our approach to this problem is motivated by observations from cognitive psychology, which identify parts as the defining component of basic level categories, while sub-ordinate categories are more often defined by modified parts. Accordingly, we suggest a two-stage algorithm: First a model of the inclusive class is learned (e.g., motorcycles in general) using the technique introduced earlier, and then subclass classification is made based on the part correspondence implied by the model. The two-stage algorithm typically outperforms a competing one-step algorithm, which builds distinct constellation models for each subclass. This performance advantage critically relies on modeling of the spatial relations between parts, and on having models with a large number of parts.

The talk is based on a joint work with Tomer Hertz and Prof. Daphna Weinshall.

MIT Thesis Defense: Learning a Dictionary of Shape-Components in Visual Cortex: Comparison with Neurons, Humans and Machine

Speaker: Thomas Serre , Dept. of Brain & Cognitive Sciences and McGovern Institute for Brain Research
Date: Wednesday, February 15 2006
Host: Prof. Tomaso Poggio, McGovern Institute for Brain Research
Relevant URL:

In this talk I will describe a quantitative model that accounts for the circuits and computations of the feedforward path of the ventral stream of visual cortex. This model is consistent with a general theory of visual processing that extends the hierarchical model of Hubel & Wiesel from primary to extrastriate visual areas and attempts to explain the first few hundred milliseconds of visual processing. One of the key elements in the approach I will describe is the learning of a generic dictionary of shape-components from V2 to IT, which provides an invariant representation to task-specific categorization circuits in higher brain areas. This vocabulary of shape-tuned units is learned in an unsupervised manner from natural images, and constitutes a large and redundant set of image features with different complexities and invariances. This theory significantly extends an earlier approach by Riesenhuber & Poggio (1999) and builds upon several existing neurobiological models and conceptual proposals.

I will present evidence to show that not only can the model duplicate the tuning properties of neurons in various brain areas when probed with artificial stimuli (like the ones typically used in physiology), but it can also handle the recognition of objects in the real-world, to the extent of competing with the best computer vision systems. Following this, I will present a comparison between the performance of the model and the performance of human observers in a rapid animal vs. non-animal recognition task for which recognition is fast and cortical back-projections are likely to be inactive. Results indicate that the model predicts human performance extremely well when the delay between the stimulus and the mask is about 50 ms. These results suggest that cortical back-projections may not play a significant role when the time interval is in this range, and the model may therefore provide a satisfactory description of the feedforward path.

Taken together, the evidence I will present shows that we may have the skeleton of a successful theory of visual cortex. In addition, this may be the first time that a neurobiological model, faithful to the physiology and the anatomy of visual cortex, not only competes with some of the best computer vision systems thus providing a realistic alternative to engineered artificial vision systems, but also achieves performance close to that of humans in a categorization task involving complex natural images.

CMU VASC talk: Video visualization - Beyond pixels and frames

Yaron Capsi, Tel Aviv University
Monday, Feb 20, 2006

Abstract: Video data is represented by pixels and frames. This restricts the way it is captured, accessed and visualized. On one hand, visual information is distributed across all frames, and therefore, in order to depict the visual information, the entire video sequence must be viewed sequentially, frame by frame. On the other hand, important visual information is lost by the limited frame rate. Similarly in the spatial domain, sensor and optics limit the capturing process, while huge redundancy prevents an efficient visualization of information. In this talk I will show how to exceed both limitations of capturing devices and of visual displays. In particular, how fusion of information from multiple sources allows to exceed temporal and spatial limitations, and how visualization of video data can benefit from importance ranking. I will describe a process that depicts the essence of video or animation, by embedding high dimensional data in low dimensional Euclidean space. I will also show how super-pixels (in contrast to pixels) contribute to the exploitation of temporal redundancy for the task of spatial segmentation of regions with high importance.

Monday, February 13, 2006

Paper: Computer Vision for Music Identification

Y. Ke, D. Hoiem, and R. Sukthankar. In Proceedings of Computer Vision and Pattern Recognition, 2005.


We describe how certain tasks in the audio domain can be effectively addressed using computer vision approaches. This paper focuses on the problem of music identification, where the goal is to reliably identify a song given a few seconds of noisy audio. Our approach treats the spectrogram of each music clip as a 2-D image and transforms music identification into a corrupted sub-image retrieval problem. By employing pairwise boosting on a large set of Viola-Jones features, our system learns compact, discriminative, local descriptors that are amenable to efficient indexing. During the query phase, we retrieve the set of song snippets that locally match the noisy sample and employ geometric verification in conjunction with an EM-based “occlusion” model to identify the song that is most consistent with the observed signal. We have implemented our algorithm in a practical system that can quickly and accurately recognize music from short audio samples in the presence of distortions such as poor recording quality and significant ambient noise. Our experiments demonstrate that this approach significantly outperforms the current state-of-the-art in content-based music identification.

project link

Sunday, February 12, 2006

My talk this week

1. Structure from sound (review and more details)

S. Thrun. Affine Structure From Sound In Proceedings of the 2005 Conference on Neural Information Processing Systems (NIPS). MIT Press, 2006.

2.Sound Object Localization and Retrieval in Complex Audio Environments

D. Hoiem, Y. Ke, and R. Sukthankar, "SOLAR: Sound Object Localization and Retrieval in Complex Audio Environments", IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2005.

The ability to identify sounds in complex audio environ-ments is highly useful for multimedia retrieval, security, and many mobile robotic applications, but very little work has been done in this area. We present the SOLAR sys-tem, a system capable of finding sound objects, such as dog barks or car horns, in complex audio data extracted from movies. SOLAR avoids the need for segmentation by scanning over the audio data in fixed increments and classifying each short audio window separately. SOLAR employs boosted decision tree classifiers to select suitable features for modeling each sound object and to discrimi-nate between the object of interest and all other sounds. We demonstrate the effectiveness of our approach with experiments on thirteen sound object classes trained using only tens of positive examples and tested on hours of audio data extracted from popular movies.

Robot Dream Exposition Taiwan 2006

I went to the robot exposition yesterday and took some photos. You can access these photos at this link.

Saturday, February 11, 2006

CNN: Toy makers hawk robotic playmates

Toy fair to feature robotic pets, 'Let's Dance' Barbie
Friday, February 10, 2006; Posted: 6:38 p.m. EST (23:38 GMT)

NEW YORK (AP) -- If children didn't get their fill of high-tech toys during the 2005 holiday season, they should brace themselves for more wizardry later this year.

With young consumers growing out of toys faster and preferring iPod digital music players and video games, the nation's toy makers are working harder to come up with more high-tech products, particularly robotic playmates.

Such robotic toys, which are even more lifelike than a year ago, are among the thousands of toys to be featured at American International Toy Fair, officially beginning Sunday.

See more.

Friday, February 10, 2006

Robot news & videos...

ROBOTS - our cutting-edge new Special Report
From our homes to the operating theatre, from war zones into space, robots are on the march. Follow their progress, plus our Expert Guide including an Instant Expert, robot video Top Ten and more...

Thursday, February 09, 2006

Cognitive-Developmental Learning for a Humanoid Robot: A Caregiver's Gift

Title: Cognitive-Developmental Learning for a Humanoid Robot: A Caregiver's Gift
Authors: Arsenio, Artur Miguel
Keywords: AI, Humanoid Robots; Developmental Learning; Perception; Human-robot Interactions
Issue Data: 22-Dec-2005
Series no: Massachusetts Institute of Technology Computer Science and Artificial Intelligence Laboratory

The goal of this work is to build a cognitive system for the humanoid robot, Cog, that exploits human caregivers as catalysts to perceive and learn about actions, objects, scenes, people, and the robot itself. This thesis addresses a broad spectrum of machine learning problems across several categorization levels. Actions by embodied agents are used to automatically generate training data for the learning mechanisms, so that the robot develops categorization autonomously. Taking inspiration from the human brain, a framework of algorithms and methodologies was implemented to emulate different cognitive capabilities on the humanoid robot Cog. This framework is effectively applied to a collection of AI, computer vision, and signal processing problems. Cognitive capabilities of the humanoid robot are developmentally created, starting from infant-like abilities for detecting, segmenting, and recognizing percepts over multiple sensing modalities. Human caregivers provide a helping hand for communicating such information to the robot. This is done by actions that create meaningful events (by changing the world in which the robot is situated) thus inducing the "compliant perception" of objects from these human-robot interactions. Self-exploration of the world extends the robot's knowledge concerning object properties.This thesis argues for enculturating humanoid robots using infant development as a metaphor for building a humanoid robot's cognitive abilities. A human caregiver redesigns a humanoid's brain by teaching the humanoid robot as she would teach a child, using children's learning aids such as books, drawing boards, or other cognitive artifacts. Multi-modal object properties are learned using these tools and inserted into several recognition schemes, which are then applied to developmentally acquire new object representations. The humanoid robot therefore sees the world through the caregiver's eyes.Building an artificial humanoid robot's brain, even at an infant's cognitive level, has been a long quest which still lies only in the realm of our imagination. Our efforts towards such a dimly imaginable task are developed according to two alternate and complementary views: cognitive and developmental.

Check here for details.

Rats Smell in Stereo

By Larry O'Hanlon, Discovery News

Feb. 8, 2006— Rats need only one sniff to take their bearings on a tasty morsel, say researchers who have discovered what may be the olfactory equivalent to stereo hearing in the common rodents.
It turns out that rats use their two nostrils with what appears to be far more efficiency than humans do, and may be a lot like some other scent-oriented animals.

Read more.

What's New @ IEEE in Communications, February 2006

A new social computing research project called SmartCampus plans to unite students through compact wireless communication devices, in an effort to facilitate social interaction on campus. A team of experts at the New Jersey Institute of Technology has brought together a group of faculty and students from diverse fields to integrate their resources and spearhead the project. The project identifies places where students are likely to gather through the use of software that allows access to participant profiles taken from mobile communication devices.
Read more:

What's New @ IEEE for Students, February 2006

A spherical robot that contains three curved plates within each other is just one of many new designs that could aid in applications for compact structures that expand into larger structures. Two engineers from opposite sides of the globe have bridged the gap between kinematics and statics, using the mathematics of the two theorems to improve the design process of computer-controlled robots. Gordon R. Pennock, a mechanical engineer at Purdue University, USA, and Offer Shai, a civil engineer at Tel Aviv University in Israel believe the new theorems represent a common language which reflects the connections between kinematics and statics, emphasizing the benefits of creating robots with enhanced stability and motion. Engineers can also use this knowledge for creating structures that are more resistant to damage from motion. The theorems offer the possibility of creating a new class of functional "multiple-platform robots" that retain their structure and stability even after being damaged or reconfigured.
Read more:

Tuesday, February 07, 2006

CMU project: SOLAR

SOLAR: Sound Object Localization and Retrieval in Complex Audio Environments

Our goal is to detect and identify sound objects, such as car horns or dog barks, in audio. Our system, called SOLAR (sound object localization and retrieval) is the first, to our knowledge, that is capable of finding a large variety of sounds in audio data from movies and other complex audio environments. Our approach is to perform a windowed scan over audio data and classify each window using a cascade of boosted decision tree classifiers. See the presentations section for a good overview of our system. This work is performed by Derek Hoiem, Yan Ke, and Rahul Sukthankar and is supported by Intel Research Pittsburgh.

click this LINK

Confidence weighted classifier combination for multi-modal human identification

Confidence weighted classifier combination for multi-modal human identification

Ivanov, YuriSerre, ThomasBouvrie, Jacob

Issue Date:

Series/Report no.:
Massachusetts Institute of Technology Computer Science and Artificial Intelligence Laboratory

In this paper we describe a technique of classifier combination used in a human identification system. The system integrates all available features from multi-modal sources within a Bayesian framework. The framework allows representinga class of popular classifier combination rules and methods within a single formalism. It relies on a “per-class” measure of confidence derived from performance of each classifier on training data that is shown to improve performance on a synthetic data set. The method is especially relevant in autonomous surveillance setting where varying time scales and missing features are a common occurrence. We show an application of this technique to the real-world surveillance database of video and audio recordings of people collected over several weeks in the office setting.

pdf file can be found at this page

A Unified Information Theoretic Framework for Pair- and Group-wise Registration of Medical Images

Title: A Unified Information Theoretic Framework for Pair- and Group-wise Registration of Medical Images
Authors: Zollei, Lilla
Advisors: Eric Grimson
Other Contributors: Vision
Keywords: population alignment, spatial normalization, congealing
Issue Date: 25-Jan-2006
Series/Report no.: Massachusetts Institute of Technology Computer Science and Artificial Intelligence Laboratory
Abstract: The field of medical image analysis has been rapidly growing for the past two decades. Besides a significant growth in computational power, scanner performance, and storage facilities, this acceleration is partially due to an unprecedented increase in the amount of data sets accessible for researchers. Medical experts traditionally rely on manual comparisons of images, but the abundance of information now available makes this task increasingly difficult. Such a challenge prompts for more automation in processing the images.In order to carry out any sort of comparison among multiple medical images, onefrequently needs to identify the proper correspondence between them. This step allows us to follow the changes that happen to anatomy throughout a time interval, to identify differences between individuals, or to acquire complementary information from different data modalities. Registration achieves such a correspondence. In this dissertation we focus on the unified analysis and characterization of statistical registration approaches.We formulate and interpret a select group of pair-wise registration methods in the context of a unified statistical and information theoretic framework. This clarifies the implicit assumptions of each method and yields a better understanding of their relative strengths and weaknesses. This guides us to a new registration algorithm that incorporates the advantages of the previously described methods. Next we extend the unified formulation with analysis of the group-wise registration algorithms that align a population as opposed to pairs of data sets. Finally, we present our group-wise registration framework, stochastic congealing. The algorithm runs in a simultaneous fashion, with every member of the population approaching the central tendency of the collection at the same time. It eliminates the need for selecting a particular referenceframe a priori, resulting in a non-biased estimate of a digital template. Our algorithm adopts an information theoretic objective function which is optimized via a gradientbased stochastic approximation process embedded in a multi-resolution setting. We demonstrate the accuracy and performance characteristics of stochastic congealing via experiments on both synthetic and real images.

News: Dino-robot is latest toy from Furby creator

By Dean Takahashi
Mercury News

When Caleb Chung has a big idea, the toy world pays attention. The co-inventor of the Furby doll created a sensation in 1998 that sold 40 million of the talking furry creatures.

Now at a Bay Area start-up, he is launching a new dinosaur robot for kids that he hopes will build upon his dream of creating lifelike, emotionally responsive mechanical animals.

Chung's new brainstorm is called Pleo and it will debut this fall. He is unveiling it today at the Demo conference in Scottsdale, Ariz., and has also taken the wraps off his Emeryville-based company, Ugobe, which is making Pleo.

``People are in love with robots, but the feedback we have is people need to have a more engaging relationship with their products,'' said Bob Christopher, chief executive of Ugobe. ``They want to treat something like a pet. So we need robots that show and feel emotion and that evolve over time.''

See the full article.

CNN News: Google Earth-VW plan navigation system

Monday, February 6, 2006; Posted: 11:45 a.m. EST (16:45 GMT)

(Reuters) -- Volkswagen of America Inc., said Friday it is working on a prototype vehicle which features Google Inc.'s satellite mapping software to give drivers a bird's eye view of the road ahead.

The two companies are working with the graphics chipmaker Nvidia build an in-car navigation map system and a three-dimensional display so passengers can recognize where they are in relation to the surrounding topography.

See the full article.

Sunday, February 05, 2006

My Talk (2006/02/08)

1.Description of the calibration parameters
2.Paper:Autocalibration of a Projector-Camera System

Paper abstract:
This paper presents a method for calibrating a projector-camera system that consists of multiple projectors (or multipleposes of a single projector), a camera, and a planar screen. We consider the problem of estimating the homography between thescreen and the image plane of the camera or the screen-camera homography, in the case where there is no prior knowledge regardingthe screen surface that enables the direct computation of the homography. It is assumed that the pose of each projector is unknownwhile its internal geometry is known. Subsequently, it is shown that the screen-camera homography can be determined from only theimages projected by the projectors and then obtained by the camera, up to a transformation with four degrees of freedom. Thistransformation corresponds to arbitrariness in choosing a two-dimensional coordinate system on the screen surface and when thiscoordinate system is chosen in some manner, the screen-camera homography as well as the unknown poses of the projectors can beuniquely determined. A noniterative algorithm is presented, which computes the homography from three or more images. Severalexperimental results on synthetic as well as real images are shown to demonstrate the effectiveness of the method.


Thursday, February 02, 2006

IEEE Tech Alert for 1 Feb 2006

4. Higher-speed Martian communications
When NASA's Mars Reconnaissance Orbiter reaches the Red Planet this month, it will immediately seek out areas where water once flowed, try to identify habitats where ancient life might have thrived, and start mapping the entire planet in unprecedented detail. But the orbiter's arrival at Mars will also set the stage for a new epoch in spacecraft telecommunications. Its onboard Electra UHF relay transceiver will serve as an engineering test bed for new communications and navigation technology that will be required for all future orbiters, landers, and rovers, to provide the faster data rates required for transfer of information from rovers and landers on the Martian surface to orbiters circling above.
See "Mars Gets Broadband Connection," by Barry E. DiGregorio:

5. Solving Sudokus for fun and mathematical profit
Millions of people around the world are tackling one of the hardest problems in computer science -- without even knowing it. The logic game Sudoku is a miniature version of a longstanding mathematical challenge, and it entices both puzzlers, who see it as an enjoyable plaything, and researchers, who see it as a laboratory for algorithm design. This is because the Sudoku is representative of a fundamental mathematical challenge known as P = NP, where, roughly speaking, P stands for tasks that can be solved efficiently, and NP stands for tasks whose solution can be verified efficiently.
See "Sudoku Science," by Lauren Anderson:

CMU VASC talk: Photo Quality Assessment: Classifying Between Professional Photos and Amateur Snapshots

Yan Ke, CMU
Monday, Feb 6, 2006

We propose a principled method for designing high level features for photo quality assessment. Our resulting system can classify between high quality professional photos and low quality snapshots. Instead of using the bag of low-level features approach, we first determine the perceptual factors that distinguish between professional photos and snapshots. Then, we design high level semantic features to measure the perceptual differences. We test our features on a large and diverse dataset and our system is able to achieve a classification rate of 72% on this difficult task. Since our system is able to achieve a precision of over 90% in low recall scenarios, we show excellent results in a web image search application.

Yan Ke is a fourth year graduate student in the CMU Computer Science Department. His interests are in computer vision. He spent four months in Beijing, China. When he was not busy touring China and eating good food, he worked on the photo quality assessment project at Microsoft Research Asia.

CMU FRC talk: Geographic Routing in Autonomous Sensor Systems without Location Information

Speaker: Bin Yu, Postdoctoral Fellow, Robotics Institute, Carnegie Mellon University
Date: Thursday, February 2, 2006

Autonomous sensor systems of the near future are envisioned to consist of hundreds of robots and UAVs (unmanned aerial vehicles). These networked autonomous sensors play strong roles in civilian and military operations, such as disaster rescue and battlefield surveillance. One of the important problems in autonomous sensor systems is data fusion, as the raw data from each sensor cannot be used directly for team coordination and needs to be fused with other relevant data in the system. In this talk I will discuss several routing algorithms for distributed data fusion in an autonomous sensor system with group mobility, including a geographic routing algorithm without the use of location information. Moreover, I will provide a detailed analysis of the effectiveness of the routing algorithms for data fusion. The simulation results show that controlled data flows significantly increase the probability of relevant data being fused.

Speaker Bio:
Dr. Bin Yu is a Postdoctoral Fellow in the School of Computer Science at CMU. He received his Ph.D. in Computer Science from North Carolina State University in 2002. His research interests lie in the areas of artificial intelligence and distributed sensor systems, with an emphasis on multiagent and multirobot systems. Dr. Yu has authored more than 20 technical papers in artificial intelligence, peer-to-peer systems, and distributed sensor systems. One of his papers appeared at the Fourth International Conference on Agents and Multiagent Systems (AAMAS-05) and was nominated for the best paper award.

Latest News from IVsource (February 1, 2006)

Continental’s media department has painted an ambitious picture of tomorrow’s cars based on their active distance sensor technology.

The Institution of Electrical Engineers (UK) has scheduled their second annual conference on Automotive Electronics for March 20-21 in London.

Nissan Motor Co. is developing a third-generation Advanced Safety Vehicle (ASV) installed with a Nissan-developed vehicle-to-vehicle communications system which alerts the driver to potential collisions in five common driving scenarios.

CyberCars 2 is all about development and demonstration of co-operative systems for automated vehicles (cyber-cars) to improve transport capacity and safety.

Wednesday, February 01, 2006

MIT Thesis Defense: Hyperglue, An infrastructure for Human-Centered Computing in Distributed Intelligent Environments

Speaker: Stephen Peters , MIT CSAIL
Date: Wednesday, February 1 2006
Contact: Stephen Peters, 617-253-8338,

As intelligent environments (IEs) move from simple kiosks and meeting rooms into the everyday offices, kitchens, and living spaces we use, the need for these spaces to communicate not only with users, but also with each other, will become increasingly important. Users will want to be able to shift their work environment between localities easily, and will also need to communicate with others as they move about. These IEs will thus require knowledge representations which can keep track of people and their relationships to the world; and communication mechanisms that can mediate interactions.

This thesis seeks to define and explore one way of creating this infrastructure, by creating societies of agents that can act on behalf of real-world entities such as users, physical spaces, or informal groups of people. Just as users interact with each other and with objects in their physical location, the agent societies interact with each other along communication channels organized along these same relationships. By organizing the infrastructure through analogies to the real world, we hope to achieve a simpler conceptual model for the users, as well as a communication hierarchy which can be realized efficiently.

CMU Talk: Human System Integration in the DoD: Challenges and Opportunities

Greg Zacharias (the talk link)

The ability of humans to cope with information processing demands has become a limiting factor on system performance, especially as systems have become more complex, layered with automation, and fielded in more demanding dynamic environments, all while the roles and responsibilities of the human operator have evolved in the face of greater computational and communications capabilities. The ability to successfully deal with these challenges has important implications not only for individual operator performance, but also for team performance, safety, organizational staffing requirements, and overall human-system effectiveness of large scale systems. This is especially true in the Department of Defense (DoD). To illustrate, we summarize a recent study conducted for the Air Force to assess the state of the art in applying Human Systems Integration (HSI) practices to modern weapons systems design and acquisition, and to recommend improvements in the overall process. We then provide a brief overview of Charles River Analytics ( ) which has been providing HSI tools and services to the DoD since its inception in the mid 80’s, and describe some of our current design projects that attempt to address some of the critical information processing demands facing today’s soldier.

CMU talk: Scalable Approaches to Deploying Swarms of Vehicles and Sensors

Vijay Kumar
Department of Mechanical Engineering and Applied Mechanics
University of Pennsylvania

The talk will address the fundamental problems and practical issues underlying the deployment of large numbers of autonomously functioning vehicles, with insights from field experiments with UAVs and UGVs in urban environments. I will present decentralized controllers and estimators that allow large numbers of robots to maintain a desired shape (formation) while following a desired trajectory. Finally, I will describe our ongoing SWARMS project whose goals are to develop a framework and methodology for the analysis of swarming behavior in biology and the synthesis of bio-inspired swarming behavior for engineered systems.

The link.