Friday, September 29, 2006

Projector-Guided Painting

Project Description:
This paper presents a novel interactive system for guiding artists to paint using traditional media and tools. The enabling technology is a multi-projector display capable of controlling the appearance of an artist.s canvas. Artists are guided by this display-on-canvas to paint according to a process model we designed to solve 3 common problems with novice painters. The artist paints according to a linear process of painting in layers and, within each layer, a set of colors. Each component of our model of the painting process has an associated interaction mode. Preview mode shows the entire layer as the current painting goal. Blank mode reveals the state of the painting. Color selection mode displays where to paint a target color. Color mixing mode shows how to mix it and orientation mode shows how to paint it. These interaction modes enable the novice to focus on painting sub-tasks in order to simplify the painting process while providing technical guidance ranging from high-level composition to detailed brushwork. We present results of a user study that quantify the benefit that our system can provide to a novice painter.
This work will be published and presented at User Interface Software Technology in Montreux, Switzerland in October, 2006.


Wednesday, September 27, 2006

[Robotics Institute Thesis Oral 2 Oct 2006]Holistic Modeling and Tracking of Road Scenes

John Sprouse
Robotics Institute
Carnegie Mellon University

Place and Time
NSH 3305
11:00 AM

This thesis proposal addresses the problem of road scene understanding for driver warning systems in intelligent vehicles, which require a model of cars, pedestrians, the lane structure of the road, and any static obstacles on it in order to accurately predict possible dangerous situations. Previous work on using computer vision in intelligent vehicle applications stops short of holistic modeling of the entire road scene. In particular, no lane tracking systems exists which detect and track multiple lanes or integrate lane tracking with tracking of cars, pedestrians, and other relevant objects. In this thesis, we focus on the goal of holistic road scene understanding, and we propose contributions in three areas: (1) the low-level detection of road scene elements such as tarmac and painted stripes; (2) modeling and tracking of complex lane structures, and (3) the integration of lane structure tracking with car and pedestrian tracking.

Further Details
A copy of the thesis oral document can be found at

Thesis Committee
Takeo Kanade, Chair
Charles Thorpe
Alexei Efros
Simon Baker, Microsoft Research, Seattle

[Robotics Institute Seminar] Is the human hand dexterous because, or in spite, of its anatomical complexity?

Faculty Candidate Talk
Francisco Valero-Cuevas
Cornell University

Time and Place
Mauldin Auditorium (NSH 1305)
Refreshments 3:15 pm
Talk 3:30 pm

The human hand is a pinnacle of mechanical versatility unequaled by electromechanical systems. It is clearly a product of brain-body coevolution. However, its anatomical structure shares numerous features with other species. In my work, I explore how the human hand meets the necessary and sufficient mechanical requirement for manipulation. This allows us to begin to distinguish and contrast the complementary contributions of anatomy and the nervous system in order to improve hand rehabilitation and suggest avenues to build better machines.

Speaker Biography
I attended Swarthmore College from 1984-88 where I obtained a BS degree in Engineering. After spending a year in the Indian subcontinent as a Thomas J Watson Fellow, I joined Queen's University in Ontario and worked with Dr. Carolyn Small. The research for my Masters Degree in Mechanical Engineering at Queen's focused on developing non-invasive methods to estimate the kinematic integrity of the wrist joint. In 1991 I joined the doctoral program in the Design Division of the Mechanical Engineering Department at Stanford University. I worked with Dr. Felix Zajac developing a realistic biomechanical model of the human digits. This research, done at the Rehabilitation R & D Center in Palo Alto, focused on predicting optimal coordination patterns of finger musculature during static force production. After completing my doctoral degree in 1997, I joined the core faculty of the Biomechanical Engineering Division at Stanford University as a Research Associate and Lecturer. My research then focused on developing experimental methods to optimize the surgical restoration of hand function following spinal cord injury and peripheral nerve injuries. In 1999 I joined the faculty of the Sibley School of Mechanical and Aerospace Engineering as an Assistant Professor. I also have close ties with the Hospital for Special Surgery in New York City.

Speaker Appointments
For appointments, please contact Jean Harpley( - 8-3802)

[Thesis Proposal] Policies based on Trajectory Libraries

Martin Stolle
Robotics Institute
Carnegie Mellon University

Place and Time
NSH 3305
10:00 AM

I present a control approach that uses a library of trajectories to establish a global control law or policy. This is an alternative to methods for finding global policies based on value functions using dynamic programming and also to using plans based on a single desired trajectory. Our method has the advantage of providing reasonable policies much faster than dynamic programming can provide an initial policy. It also has the advantage of providing more robust and global policies than following a single desired trajectory. Trajectory libraries can be created for robots with many more degrees of freedom than what dynamic programming can be applied to as well as for robots with dynamic model discontinuities. Results are shown for the “Labyrinth” marble maze and the Little Dog quadruped robot. The marble maze is a difficult task which requires both fast control as well as planning ahead. In the Little Dog terrain, a quadruped robot has to navigate quickly across small-scale rough terrain. In our past work, I have used global state to represent the knowledge in the trajectory libraries. In order to broaden the use of a library, I propose the use of local state representations, which allow the knowledge represented by a library to be used in novel situations. Three different mechanisms for this transfer are proposed: Information about the goal of a task can be explicitly represented in the local state. Libraries using this representation can be transferred directly to new tasks. Alternatively, the local state representation might not include a goal feature. When using such a library, a search over actions in the library has to be used to pick actions that obtain the goal. Finally, one can cluster the actions in the library in order to create abstract actions. This will simplify the search process.

Further Details
A copy of the thesis proposal document can be found at

Thesis Committee
Christopher Atkeson, Chair
James Kuffner
Drew Bagnell
Riger Dillmann, University of Karlsruhe

Tuesday, September 26, 2006

Lab Meeting 29 Sep.,2006 (Chihao): 3D Sound Source Localization System Based on Learning of Binaural Hearing

Title: 3D Sound Source Localization System Based on Learning of Binaural Hearing
Author: Hiromichi Nakashima, Toshiharu Mukai
This paper appears in: IEEE SMC 2005 (IEEE International Conference on Systems, Man, and Cybernetics)
We have thus far developed two types of sound source localization system, one of which can localize the horizontal direction and the other the vertical direction. These systems can acquire the localization ability by self-organization through repetition of movement and perception. In this paper, we report a newly built sound source localization system that can detect the direction of a sound source arbitrarily located in front of it. This system is composed of a robot that has two microphones with reflectors corresponding to human’s pinnas. To acquire the horizontal direction, the interaural time difference is used as the auditory cue. To acquire the vertical direction, the features on the audio spectrum induced by the reflectors are used as the auditory cue. The robot can establish the relationship between the cues and the sound direction through learning.


Lab Meeting 29 Sep.,2006 (Vincent): Active Appearance Models

Title : Active Appearance Models

Author : Timothy F. Cootes, Gareth J. Edwards, and Christopher J. Taylor

Origin :

Abstact :
We describe a new method of matching statistical models of appearance to images. A set of model parameters control modes of shape and gray-level variation learned from a training set. We construct an efficient iterative matching algorithm by learning the relationship between perturbations in the model parameters and the induced image errors.

You can find the full article here.

CMU ML talk: Learning-based Deformable Neuroimage Registration

Leonid Teverovskiy, MLD, CMU.

September 25

Deformable neuroimage registration is an active and challenging research area. It forms a crucial component of many computational and clinical neuroscience applications, including computer aided diagnosis, statistical quantification of human brain, and atlas-based neuroimage segmentation.

Maximizing the number of correctly estimated voxel correspondences enhances the accuracy of a deformable registration algorithm. Most existing feature-based deformable registration algorithms use a pre-defined set of image features to estimate correspondences for all voxels. These methods have two main weaknesses. First, the featurevector is constructed by the authors of the algorithms rather than automatically selected to minimize registration error. Second, the samefeature vector is used for all the voxels in the whole brain image, without consideration given to the inhomogeneity of the anatomical structures and their corresponding voxels.

We propose a new learning-based deformable registration algorithm that performs feature selection for every voxel. Our algorithm can be trained to accurately register specific anatomical structures as well as the entire neuroimages of specific patient groups. The main novelty of our approach is that it automatically learns feature vectors for distinguished individual image voxels, thus increasing correspondence estimation accuracy. Our method utilizes a decision theoretic approach to systematically calculate the expected correspondence estimation error for a voxel in many different feature spaces, and then select the space with the smallest error. Our feasibility study on the 2D midsagittal slices shows that learning feature subspace increases number ofcorrectly estimated correspondences by 20%.

We will quantitatively evaluate the performance of our deformable registration algorithm and apply it to several medical image analysis problems.

CMU Intelligence Seminar: Semantic Models of Shape

Jovan Popovic', CSAIL, MIT
Tuesday 9/26

Abstract: Conventional representations of shape (splines, meshes, etc.) provide general modeling controls without differentiating between real and meaningless outcomes. This burdens human operators and computational techniques with the task of searching through a vast and cluttered design space. Semantic representations clear up the clutter by attaching human understanding to computational representations of shape and motion.

Bio: Jovan Popovic' is an Associate Professor in the Department of Electrical Engineering and Computer Science, and a member of the Computer Graphics Group in the Computer Science and Artificial Intelligence Laboratory, at the Massachusetts Institute of Technology. Before arriving at MIT in the 2001, Jovan Popovic' received his Ph.D. in Computer Science from Carnegie Mellon University and his B.S. degrees in Mathematics and Computer Science from Oregon State University. His research employs computer science, mathematics and physics to explore the applications of geometric modeling and computer animation to the fields of computer graphics, human-computer interaction, biomechanics, robotics, and computational design.

Monday, September 25, 2006

Call for papers: JFR Special Issue: Safety, Security and Rescue Robots

The Journal of Field Robotics (JFR) announces a special issue on robotic aspects of safety, security, and rescue to examine issues related to the fielding of robots to respond to or prevent emergencies of either natural or man-made origins. Such emergencies provide many challenges for mechanisms, navigation, sensing, networking, collaboration, human/machine interaction, and decision making.

We invite papers that exhibit state-of-the-art theory and methods applied to fielded studies including:

* novel locomotion mechanisms for rough terrain
* robotic sensors and sensing techniques for unstructured or semi-structured terrain
* novel human/robot interaction devices and paradigms for emergency response
* collaborative systems of land/sea/air vehicles for search/rescue/assessment
* lessons learned from robotic land/sea/air deployments in mines, collapsed structures, and wide area disasters

The complete call for papers for this special issue can be found at:

Please note that the deadline for submissions is November 1, 2006.

We look forward to your submission.

-Richard Voyles and Howie Choset

Sunday, September 24, 2006

News: Robot manufacturing to be Taiwan's next booming industry: MOEA

The China Post

The research and development of artificial intelligence robots could help the total output value of Taiwan's machinery industry to top NT$1 trillion (US$30.4 billion) by 2009, with an annual growth rate of no less than 30 percent before 2012, the Industrial Development Bureau (IDB) under the Ministry of Economic Affairs said yesterday.

The IDB also predicted that by 2016, A.I. robot manufacture alone could generate an output value of NT$250 billion and an export value of NT$175 billion, as well as contribute at least 1.35 percent to Taiwan's GDP, adding that 22,000 jobs could be created.

Moreover, the IDB estimated that the global production value of the robot industry could exceed that of the automobile sector worldwide by 2020, reaching some US$1.4 trillion.

...... See the full article.

Saturday, September 23, 2006

Lab Meeting 29 Sep.,2006 (Ashin):Using GPS to learn significant locations and predict movement across multiple users

Authors:Daniel Ashbrook and Thad Starner

From:Personal and Ubiquitous Computing Volume 7, Number 5 / October, 2003

Wearable computers have the potential to act as intelligent agents in everyday life and assist the user in a variety of tasks, using context to determine howto act. Location is the most common form of context used by these agents to determine the user's task.However, another potential use of location context isthe creation of a predictive model of the user's future movements. We present a system that automatically clusters GPS data taken over an extended periodof time into meaningful locations at multiple scales.These locations are then incorporated into a Markovmodel that can be consulted for use with a variety ofapplications in both single user and collaborative scenarios.


Thursday, September 21, 2006

IEEE news: How to manage an impending deluge of new data?

The WalMart project, which aims to have every item delivered and sold tagged with a radio frequency identification strip (RFID), is only the tip of an information iceberg. As the data streaming from computers, sensors, and other real-time devices swells, a new infrastructure will be required to manage it. Software engineers are developing stream-processing engines to handle the flood.

See "Data Torrents and Rivers," by Michael Stonebraker: the link

IEEE news: Sounding out IEEE's fellows

How do some of the IEEE's most distinguished and accomplished members see the future of technology? Spectrum polled 700 of the organization's fellows to find out. A third think we'll eventually have 3D televisions in our homes, but more than two fifths doubt anybody will ever market a quantum computer. Majorities believe that microscale robots will be a reality and that computers will soon be able to process speech and writing with almost perfect accuracy. But there's deep skepticism about "cold fusion" and room-temperature superconductors.

See "Bursting Tech Bubbles Before They Balloon," by Marina Gorbis and David Pescovitz: the link.

Wednesday, September 20, 2006

Lab meeitng 22 Sep., 2006(ZhenYu):Communication Robots for Elementary Schools

Authors: Takayuki Kanda,Hiroshi Ishiguro

From: Proc. AISB'05 Symposium Robot Companions: Hard Problems and Open Challenges in Robot-Human Interaction, pp. 54-63, April 2005.

Abstract: This paper reports our approaches and efforts for developing communication robots for elementary schools. In particular, we describe the fundamental mechanism of the interactive humanoid robot, Robovie, for interacting with multiple persons, maintaining relationships, and estimating social relationships among children. The developed robot Robovie was applied for two field experiments at elementary schools. The first experiment purpose using it as a peer tutor of foreign language education, and the second was purposed for establishing longitudinal relationships with children. We believe that these results demonstrate a positive perspective for the future possibility of realizing a communication robot that works in elementary schools.


Tuesday, September 19, 2006

Lab meeitng 22 Sep., 2006 (Eric): Dual Photography

Authors: Pradeep Sen, Billy Chen, Gaurav Garg, Stephen R. Marschner, Mark Horowitz, Marc Levoy, Hendrik P. A. Lensch

From: ACM SIGGRAPH 2005 conference proceedings

Abstract: We present a novel photographic technique called dual photography,
which exploits Helmholtz reciprocity to interchange the lights
and cameras in a scene. With a video projector providing structured
illumination, reciprocity permits us to generate pictures from
the viewpoint of the projector, even though no camera was present
at that location. The technique is completely image-based, requiring
no knowledge of scene geometry or surface properties, and
by its nature automatically includes all transport paths, including
shadows, inter-refections and caustics. In its simplest form, the
technique can be used to take photographs without a camera; we
demonstrate this by capturing a photograph using a projector and
a photo-resistor. If the photo-resistor is replaced by a camera, we
can produce a 4D dataset that allows for relighting with 2D incident
illumination. Using an array of cameras we can produce a 6D
slice of the 8D re-refectance feld that allows for relighting with arbitrary
light felds. Since an array of cameras can operate in parallel
without interference, whereas an array of light sources cannot, dual
photography is fundamentally a more effcient way to capture such
a 6D dataset than a system based on multiple projectors and one
camera. As an example, we show how dual photography can be
used to capture and relight scenes.


Lab meeitng 22 Sep., 2006 (Bright): A Novel System for Tracking Pedestrains Using Multiple Single-Row Laser-Range Scanners

Authors: Huijing Zhao and Ryosuke Shibasaki


Abstract: In this research, we propose a novel system for
tracking pedestrians in a wide and open area, such as a shopping
mall and exhibition hall, using a number of single-row laser-range
scanners (LD-A), which have a profiling rate of 10 Hz and a
scanning angle of 270 . LD-As are set directly on the floor doing
horizontal scanning at an elevation of about 20 cm above the
ground, so that horizontal cross sections of the surroundings,
containing moving feet of pedestrians as well as still objects, are
obtained in a rectangular coordinate system of real dimension.
The data of moving feet are extracted through background subtraction
by the client computers that control each LD-A, and sent
to a server computer, where they are spatially and temporally integrated
into a global coordinate system. A simplified pedestrian’s
walking model based on the typical appearance of moving feet is
defined and a tracking method utilizing Kalman filter is developed
to track pedestrian’s trajectories. The system is evaluated through
both real experiment and computer simulation. A real experiment
is conducted in an exhibition hall, where three LD-As are used
covering an area of about 60 x 60 m2. Changes in visitors’ flow
during the whole exhibition day are analyzed, where in the peak
hour, about 100 trajectories are extracted simultaneously. On the
other hand, a computer simulation is conducted to quantitatively
examine system performance with respect to different crowd


CMU thesis proposal: Scale Selection and Invariance in Low Level Vision

Ranjith Unnikrishnan, Robotics Institute
18 Sep 2006

The representation of objects through locally computed features is a concept common to many approaches in 2-D and 3-D computer vision. The use of local information to infer global properties aims to serve several purposes such as robustness to outlying structures, variation in viewing conditions, noise and occlusion. Reliable computation of relevant local attributes is thus an important part of any practical vision system intended to perform higher level reasoning.

The task of making such local observations necessitates making choices of the neighborhood size within which the computation is performed, also referred to as the /scale/ of the observation. This in turn poses several unanswered questions relevant to both data representation (e.g. reconstruction and compression) as well as data identification (e.g. object detection and classification). At what scale is it meaningful to compute a local feature? What is the optimal neighborhood size for estimating local geometric properties from data? While many advances have been made in a theory of scale for 2-D luminance images, little attention has been paid to the domains of unorganized point clouds (as would be acquired with a laser range scanner) or to alternate representations of images (such as color or other pixel-wise functions such as optical flow).

This thesis explores the problems of scale selection and invariance in previously unaddressed problem domains, and proposes solutions for several useful vision tasks:

* We propose to extend current application of scale theory for interest region extraction in 2-D images to alternate, potentially more useful representations. As an example, we demonstrate how both scale as well as illuminant invariant keypoint detection may be achieved in the case of color (RGB) images without having to estimate the properties of the illuminant.

* We present methods to robustly compute local differential properties from non-uniform unstructured point clouds. In particular, we show how data-driven adaptation of the neighborhood size in local PCA when computing tangents (normals) from spatial curves (surfaces) can make even this naive estimator more robust than leading fixed-scale alternatives.

* We propose the development of new scale-space representations of 3-D point cloud data that are robust to changes in sampling. By this, we advocate changing the current practice of using a single globally fixed value of scale when computing shape descriptors from 3-D data to that of using a value that is locally data-driven.

* We propose to investigate the application of local intrinsic scale detection for manifold learning. The aims of this analysis are improved statistical properties and robustness of embedding functions and regularizers to sampling variations in the dataset.

Overall, the expected contributions of this thesis are new technique and tools for the scale selection problem that is fundamental to local data analysis and learning from real-world measurements.

Further Details: A copy of the thesis proposal document can be found at the link.

CMU vasc talk: Visual Recognition and Tracking for Perceptive Interfaces

Trevor Darrell

Devices should be perceptive, and respond directly to their human user and/or environment. In this talk I'll present new computer vision algorithms for fast recognition, indexing, and tracking that make this possible, enabling multimodal interfaces which respond to users' conversational gesture and body language, robots which recognize common object categories, and mobile devices which can search using visual cues of specific objects of interest. As time permits, I'll describe recent advances in real-time human pose tracking for multimodal interfaces, including new methods which exploit fast computation of approximate likelihood with a pose-sensitive image embedding. I'll also present our linear-time approximate correspondence kernel, the Pyramid Match, and its use for image indexing and object recognition, and discovery of object categories. Throughout the talk, I'll show interface examples including grounded multimodal conversation as well as mobile image-based information retrieval applications based on these techniques.

BIO: Trevor Darrell is an Associate Professor of Electrical Engineering and Computer Science at M.I.T. He leads the Vision Interface Group at the Computer Science and Artificial Intelligence Laboratory. His interests include computer vision, interactive graphics, and machine learning. Prior to joining the faculty of MIT he worked as a Member of the Research Staff at Interval Research in Palo Alto, CA, researching vision-based interface algorithms for consumer applications. He received his PhD and SM from MIT in 1996 and 1991, respectively, while working at the Media Laboratory, and the BSE from the University of Pennsylvania in 1988, where he worked in the GRASP Robotics Laboratory.

CMU ML talks: UAI 2006 conference review

1. On the Number of Samples Needed to Learn the Correct Structure of a Bayesian Network

by Or Zuk, Shiri Margel and Eytan Domany

Bayesian Networks (BNs) are useful tools giving a natural and compact representation of joint probability distributions. In many applications one needs to learn a Bayesian Network (BN) from data. In this context, it is important to understand the number of samples needed in order to guarantee a successful learning. Previous works have studied BNs sample complexity, yet they mainly focused on the requirement that the learned distribution will be close to the original distribution which generated the data. In this work, we study a different aspect of the learning task, namely the number of samples needed in order to learn the correct structure of the network. We give both asymptotic results (lower and upper-bounds) on the probability of learning a wrong structure, valid in the large sample limit, and experimental results, demonstrating the learning behavior for feasible sample sizes.

2. Non-Minimal Triangulations for Mixed Stochastic/Deterministic Graphical Models
by Chris D. Bartels and Jeff A. Bilmes

We observe that certain large-clique graph triangulations can be useful for reducing computational requirements when making queries on mixed stochastic/deterministic graphical models. We demonstrate that many of these large clique triangulations are non-minimal and are thus unattainable via the elimination algorithm. We introduce ancestral pairs as the basis for novel triangulation heuristics and prove that no more than the addition of edges between ancestral pairs need be considered when searching for state space optimal triangulations in such graphs. Empirical results on random and real world graphs are given. We also present an algorithm and correctness proof for determining if a triangulation can be obtained via elimination, and we show that the decision problem associated with finding optimal state space triangulations in this mixed setting is NP-complete.

Monday, September 18, 2006

Lab meeting this Fall

Bright, Eric and Zhen-Yu are the speakers of this week. Please post your talks asap.

When: 10:30 AM - 12:30 PM
Where: CSIE R424/426



Sunday, September 17, 2006

CMU AI talk: An Axiomatic Approach to Ranking Systems

Alon Altman
October 3, 2006

ABSTRACT: This talk will survey our recent work in applying the axiomatic approach to ranking systems. Ranking systems are systems in which agents rank each other to produce a social ranking. In the axiomatic approach we study ranking systems under the light of basic properties, or axioms. In this talk I will present our axiomatization theorem for the PageRank ranking system, prove an impossibility and possibility result for general ranking systems, and discuss the issue of incentives in ranking systems. Finally, I will show initial results regarding personalized ranking systems, where a specialized ranking is generated for each agent.

CMU AI talk: Cost-sensitive Classifier Evaluation Using Cost Curves

Speaker: Robert C. Holte
When: Monday, September 25, 2006 at 3:30p
Where: Newell Simon Hall 1305

The evaluation of classifier performance in a cost-sensitive setting is straightforward if the operating conditions (misclassification costs and class distributions) are fixed and known. When this is not the case, evaluation requires a method of visualizing classifier performance across the full range of possible operating conditions. This talk argues that the classic technique for classifier performance visualization -- the ROC curve -- is inadequate for the needs of researchers and practitioners in several important respects. It then describes a different way of visualizing classifier performance -- the cost curve -- that overcomes these deficiencies. No familiarity with ROC curves or cost curves is necessary, they will be fully explained. Joint work with Chris Drummond (National Research Council, Ottawa)

CMU FRC talk: Celestial Navigation for Localization of Planetary Rovers

Deborah Sigel, PhD Candidate
Robotics Institute, Carnegie Mellon University

With new interest in placing rovers on the Moon as a precursor to human re-landing, there is a need to develop modern technology to support a landing and operating a semi-autonomous vehicle on the surface, with minimal support infrastructure. The challenge of localization and navigation on an atmosphere-less body without GPS architecture or relay satellite presents a unique opportunity to explore the benefits of celestial navigation.

Here we propose a new method to localize a vehicle on a planetary body using a standard spacecraft star tracker. This talk will first provide a look into the history of celestial navigation and spacecraft attitude control systems to introduce modern tools available for rover localization. Two different rover celestial localization schemes, StarGrav and a new wide field-of-view star tracker method, will be described and compared. A conceptual hardware design for a flight Lunar celestial localization system based on the wide FOV star tracker will then be presented.

Speaker Bio: Deborah Sigel is a PhD candidate in the FRC, working with David Wettergreen. Her research interests include development of robotic technology and methods to improve space and planetary exploration. She obtained an MS in Aerospace Engineering at University of Maryland, and BS in Aerospace and Mechanical Engineering at Rensselaer Polytechnic Institute. Her technical experience has included designing and building spaceflight astronaut hand tools for NASA's Return to Flight program, while at Swales Aerospace, to assist astronauts in space shuttle on-orbit repairs, used on flights STS-114 and STS-121. She has also worked at NASA JPL to design mechanical hardware for the Mars Exploration Rovers (Spirit and Opportunity) and Mars Science Laboratory rover.

CMU proposal: Volumetric Descriptors for Efficient Video Analysis

Yan Ke

When: Wednesday, September 13, 09:30 a.m.
Where: 3305 Newell-Simon Hall

Abstract: The amount of digital video has grown exponentially in recent years. However, the technology for making intelligent searches on video has failed to keep pace. The question of how to efficiently represent video, optimized for retrieval, is still an open question. We make the key observation that objects in video span both space and time, and therefore 3D spatio-temporal volumetric features are natural representations forthem. The goal of this thesis to propose efficient volumetric representations for video and evaluate how well these representations perform in a wide range of applications. Example applications include video retrieval and action recognition. Our approach is divided into three main parts: spatio-temporal region extraction, volumetric region representations, and matching/recognition methods in video. We first use unsupervised clustering to extract an over-segmentation of the video volume. The regions loosely correspond to object boundaries in space-time. Next, we construct a volumetric representation for the regions and define a distance metric to match them. Finally, we learn models based on multiple templates of user-specified actions, such as tennis serves, running, dance moves, etc. We plan to evaluate the proposed method and compare against existing methods on a large video database.

Thesis Summary: the link.

Thursday, September 14, 2006

Lab Meeting 15 Sep., 2006 (Nelson) : nScan-Matching : Simultaneous Matching of Multiple Scans and Application to SLAM

LINK (pdf)

Peter Biber Wolfgang Stra├čer

University of Tubingen

Abstract—Scan matching is a popular way of recovering
a mobile robot’s motion and constitutes the basis of many
localization and mapping approaches. Consequently, a variety
of scan matching algorithms have been proposed in the past.
All these algorithms share one common attribute: They match
pairs of scans to obtain spatial relations between two robot poses.
In this paper we present a method for matching multiple scans
simultaneously. We discuss the need for such a method and
describe how the result of such a multi-scan matching can be
incorporated into relation-based SLAM in the Lu and Milios

Wednesday, September 13, 2006

News: Creator of AIBO to launch dancing humanoid robot

Speecys Corp., a venture firm launched by robot engineer Tomoaki Kasuga, the creator of Sony Corp.'s AIBO robot dog, said Tuesday it will start selling a small humanoid robot that can sing, dance, read the news and give English language lessons, among other things.

The mostly white, 33-centimeter high, 1.5-kilogram MI RAI-RT robot will be priced at 294,000 yen. The company will start accepting orders via the Internet on Sept. 30.

See the full article.

News: Hitachi Develops Crowd-Navigating Robot

Tokyo, Sept 12, 2006 (JCN) - Hitachi Ltd., in collaboration with Tsukuba University, has developed the Excellent Mobility and Interactive Existance as Workmate (EMIEW) robot, which can walk between people without bumping into them and navigate crowded places.

The robot moves at at 0.8m per second and uses a laser sensor to detect the distance of obstacles. During trials, the robot smoothly passed four people walking at speeds of up to 1.2m per second.

See the full article.

Monday, September 11, 2006

Lab meeitng 15 Sep., 2006 (Stanley): Lessons Learned in Integration for Sensor-Based Robot Navigation Systems

Authors: Luis Montesano, Javier Minguez and Luis Montano

From: International Journal of Advanced Robotic Systems, Volume 3, no. 1, Pages 85-91, 2006

Abstract: This paper presents our work of integration during the last years within the context of sensor‐based robot navigation systems. In our motion system, as in many others, there are functionalities involved such as modeling, planning or motion control, which have to be integrated within an architecture. This paper addresses this problematic. Furthermore, we also discuss the lessons learned while: (i) designing, testing and validating techniques that implement the functionalities of the navigation system, (ii) building the architecture of integration, and (iii) using the system on several robots equipped with different sensors in different laboratories.


Sunday, September 10, 2006

IBM: Open Source Robotics Toolkits

Robot simulators can greatly simplify the job of building physical robots. Through simulators, you can test ideas and strategies before putting them into hardware. Luckily, the Linux and open source communities have several options that save you time and money, and can even support direct linkage to hardware platforms. This article introduces you to some of the open source robotics toolkits for Linux, demonstrates their capabilities, and helps you decide which is best for you.

Saturday, September 09, 2006

CMU report: On the Beaten Path: Exploitation of Entities Interactions For Predicting Potential Link

Y. Seo and K. Sycara
tech. report CMU-RI-TR-06-36, Robotics Institute, Carnegie Mellon University, August, 2006.

We propose a new non-parametric link analysis algorithm that predicts a potential link between entities given a set of different relational patterns. The proposed method first represents different types of relations among entities by constructing the corresponding number of factorized matrices from the original entity-by-relation matrices. The prediction of a possible link between entities is done by linearly summing the weighted distances in the latent spaces. A logistic regression is used to estimate regression coefficients of distances in the latent spaces. From the experimental comparisons with various algorithms, our algorithm performs best in precision and second-best in recall measure. (pdf)

CMU report: Combining multiple hypotheses for identifying human activities

Y. Seo and K. Sycara
tech. report CMU-RI-TR-06-31, Robotics Institute, Carnegie Mellon University, May, 2006.


Dempster-Shafer theory is one of the predominant methods for combining evidence from different sensors. However, it has been observed that Dempster's rule of combination may yield inaccurate results in some situations. In this paper, we examine the properties and the performance of five different combination rules on a set of real world data. The data was obtained through biometric sensors from a number of human subjects. The problem we study is the prediction of the activity state of a human, given time series readings from the biometric sensors.

CMU ML talks: the ICML 2006 Conference Review Session

CMU ML lunch talks

1. Support Vector Decomposition Machine
by F Pereira and G Gordon

In machine learning problems with tens of thousands of features and only dozens or hundreds of independent training examples, dimensionality reduction is essential for good learning performance. In previous work, many researchers have treated the learning problem in two separate phases: first use an algorithm such as singular value decomposition to reduce the dimensionality of the data set, and then use a classification algorithm such as naive Bayes or support vector machines to learn a classifier. We demonstrate that it is possible to combine the two goals of dimensionality reduction and classification into a single learning objective, and present a novel and efficient algorithm which optimizes this objective directly. We present experimental results in fMRI analysis which show that we can achieve better learning performance and lower-dimensional representations than two-phase approaches can.

2. Inference with the Universum
by Jason Weston, Ronan Collobert, Fabian Sinz, Leon Bottou and Vladimir Vapnik

In this paper we study a new framework introduced by (Vapnik 1998) that is an alternative capacity concept to the large margin approach. In the particular case of binary classification, we are given a set of labeled examples, and a collection of "non-examples" that do not belong to either class of interest. This collection, called the Universum, allows one to encode prior knowledge by representing meaningful concepts in the same domain as the problem at hand. We describe an algorithm to leverage the Universum by maximizing the number of observed contradictions, and show experimentally that this approach delivers accuracy improvements over using labeled data alone.

3. Bayesian Multi-Population Haplotype Inference via a Hierarchical Dirichlet Process Mixture
by E.P. Xing, K. Sohn, M.I. Jordan and Y.W. Teh

Uncovering the haplotypes of single nucleotide polymorphisms and their population demography is essential for many biological and medical applications. Methods for haplotype inference developed thus far-including methods based on coalescence, finite and infinitemixtures, and maximal parsimony ignore the underlying population structure in the genotype data. As noted by Pritchard(2001), different populations can share certain portion of their genetic ancestors, as well as have their own genetic components through migration and diversification. In this paper, we address the problem of multi-population haplotype inference. We capture cross-population structure using a nonparametric Bayesian prior known as the hierarchical Dirichlet process (HDP) (Teh et al.,2006), conjoining this prior with a recently developed Bayesian methodology for haplotype phasing known as DP-Haplotyper (Xinget al., 2004). We also develop an efficient sampling algorithm for the HDP based on a two-level nested P?olya urn scheme. Weshow that our model outperforms extant algorithms on both simulated and real biological data.

News: Bringing Robot Transportation to Europe

A new European Union-funded project will see the introduction of driverless Taxis at Heathrow, "cyber cars" in Rome and an automatic bus in Castell├│n, Spain. And that's only the beginning.

Transportation planners have long dreamed of an age of driverless taxis that could help alleviate traffic in congested areas and that vision of driverless urban areas could soon become reality. Under the auspices of the European Union's "Citymobil" project, which was launched on August 28, companies and research institutes representing 10 countries have come together to develop small automatic transportation systems. Currently, three model projects are planned with funding of about €40 million.

See the full article.

News: Robot breakthrough brings fingertip feeling

Lewis Smith
September 09, 2006

A TOUCH sensor developed to match the sensitivity of the human finger is set to herald the age of the robotic doctor. Until now robots have been severely handicapped by their inability to feel with anything like the accuracy of their human creators.

The very best are unable to beat the dexterity of a six-year-old at knotting shoelaces or building a house of cards.

But all that could change with the development by nanotechnologists of a device that can "feel" the shape of a coin, down to the detail of the letters stamped on it.

The ability to feel with at least the same degree of sensitivity as a finger is crucial to the development of robots that can take on complicated tasks such as open heart surgery.

See the full article: the link

Tuesday, September 05, 2006

Atwood's Talk on the Lab meeing(Sep. 7): Integrated Person Tracking Using Stereo, Color and Pattern Detection

Title: Integrated Person Tracking Using Stereo, Color and Pattern Detection

Abstract: We present an approach to real-time person tracking in crowded and/or unknown environments using
integration of multiple visual modalities.We combine stereo, color, and face detection modules into a single robust
system, and show an initial application in an interactive, face-responsive display. Dense, real-time stereo processing
is used to isolate users from other objects and people in the background. Skin-hue classification identifies and tracks
likely body parts within the silhouette of a user. Face pattern detection discriminates and localizes the face within
the identified body parts. Faces and bodies of users are tracked over several temporal scales: short-term (user stays
within the field of view), medium-term (user exits/reenters within minutes), and long term (user returns after hours
or days). Short-term tracking is performed using simple region position and size correspondences, while medium
and long-term tracking are based on statistics of user appearance. We discuss the failure modes of each individual
module, describe our integration method, and report results with the complete system in trials with thousands of users.

Source: International Journal of Computer Vision 37(2), 175–185, 2000 or
International Conference on Computer Vision, 1998

(meoscar)My Talk, Sep 7 2006: Visual-Hull Reconstruction from Uncalibrated and Unsynchronized Video Streams

Title:Visual-Hull Reconstruction from Uncalibrated and Unsynchronized Video Streams.
Proceeding of the Second International Symposium on 3D Data Processing, Visualization and Transmission (3DPVT'04).
Author: Sudipta N. Sinha and Marc Pollefeys .Department of Computer ScienceUniversity of North Carolina at Chapel Hill, USA.
Abstract:We present an approach for automatic reconstruction of adynamic event using multiple video cameras recording fromdifferent viewpoints. Those cameras do not need to be calibratedor even synchronized. Our approach recovers allthe necessary information by analyzing the motion of thesilhouettes in the multiple video streams. The first stepconsists of computing the calibration and synchronizationfor pairs of cameras. We compute the temporal offset andepipolar geometry using an efficient RANSAC-based algorithmto search for the epipoles as well as for robustness.In the next stage the calibration and synchronization forthe complete camera network is recovered and then refinedthrough maximum likelihood estimation. Finally, a visualhullalgorithm is used to the recover the dynamic shape ofthe observed object. For unsynchronized video streams silhouettesare interpolated to deal with subframe temporaloffsets. We demonstrate the validity of our approach byobtaining the calibration, synchronization and 3D reconstructionof a moving person from a set of 4 minute videosrecorded from 4 widely separated video cameras.
PDF file: the link
About the Author's others research:the link

Monday, September 04, 2006

Lab meeitng 7 Sep., 2006 (Vincent): Robust face detection with multi-class boosting

Title : Robust face detection with multi-class boosting
Author : Yen-Yu Lin @
Tyng-Luh Liu @

This paper is from CVPR2005

Abstract :

With the aim to design a general learning framework for detecting faces of various poses or under different lighting conditions, we are motivated to formulate the task as a classification problem over data of multiple classes. Specifically, our approach focuses on a new multi-class boosting algorithm, called MBHboost, and its integration with a cascade structure for effectively performing face detection. There are three main advantages of using MBHboost: 1) each MBH weak learner is derived by sharing a good projection direction such that each class of data has its own decision boundary; 2) the proposed boosting algorithm is established based on an optimal criterion for multi-class classification; and 3) since MBHboost is flexible with respect to the number of classes, it turns out that it is possible to use only one single boosted cascade for the multi-class detection. All these properties give rise to a robust system to detect faces efficiently and accurately.

Here is the link of the paper.

Sunday, September 03, 2006

News: Robot Can Taste Wine and Cheeses

Saturday, September 2, 2006 3:28 PM

A new robot that can taste wine and identify cheeses.

Ever got bored of sniffing cheeses and tasting wine? Why not have a robot do it for you!

Researchers at NEC and Mie University in Japan have designed a robot that can taste and identify dozens of various wines, cheeses, and hors d'oeuvres.

A green and white prototype robot with eyes, a head and mouth was unveiled last month. The robot's left arm is equipped with an infared spectrometer that fires off a beam of infared light when objects are placed up against the sensor. The reflected light is analyzed in real time to determine the object's chemical composition and even alert possible health issues (i.e. salty or fatty foods).

See the full article.

CNN news: Rolling robot takes the tunes with you

POSTED: 12:36 p.m. EDT, August 31, 2006

TOKYO, Japan (AP) -- The new Japanese robot Miuro turns an iPod music player into a dancing boombox-on-wheels.

The 14-inch-long machine from ZMP Inc. blares music as it rolls and twists from room to room. The robot, which looks like a ball popping out of an egg, has a speaker system from Kenwood Corp.

See the full article.

MIT talk: Simulating human behaviors: Instructions, models, and parameterized actions

Norman I. Badler , University of Pennsylvania-School of Engineering and Applied Science
Date: Thursday, September 7 2006
Host: Jovan Popovic, MIT - CSAIL - Computer Graphics Group
Relevant URL:

Recently there has been considerable maturation in understanding how to use computer graphics technology to portray 3D virtual human agents. Unlike the off-line, animator-intensive methods used in the special effects industry, such real-time agents are expected to exist and interact with us "live." They can be represent other people or function in a virtual environment as autonomous helpers, teammates, or adversaries enabling novel interactive educational and training applications. Real people and virtual humans should be able to interact and communicate non-verbally, intentionally or not, through facial expressions, eye gaze, and gesture. We study such issues, including consistent parameterizations for gesture and facial actions using movement observation principles and visual attention and perception models. We developed a Parameterized Action Representation (PAR) that embodies certain semantics of human action and allows an agent to act, plan, and reason about its actions or actions of others. PAR is also designed for instructing future behaviors for autonomous agents and aggregates, and for controlling animation parameters that can individualize embodied agents. Group behaviors are additionally conditioned on agent roles and interpersonal communications. We also design instruction presentation and execution systems to facilitate virtual task training. We just started new projects to author instructions by direct performance.