Friday, June 30, 2006

A Chinese Blog on Robots

Lab and advisee meetings rescheduled to next THURSDAY

Because of two important meetings this coming Wednesday, we have to reschedule our lab and advisee meetings to July 6 (Thursday).

Jim will talk about visual-SLAM, and Bright will demonstrate multiple hypothesis tracking (MHT) and joint probabilistic data association filter (JPDA).

Best Regards,


Thursday, June 29, 2006

News: Device records smells to play back later

01 July 2006 news service
Paul Marks

IMAGINE being able to record a smell and play it back later, just as you can with sounds or images.

Engineers at the Tokyo Institute of Technology in Japan are building an odour recorder capable of doing just that. Simply point the gadget at a freshly baked cookie, for example, and it will analyse its odour and reproduce it for you using a host of non-toxic chemicals. "Point the gadget at a freshly baked cookie and it will reproduce the odour"

The device could be used to improve online shopping by allowing you to sniff foods or fragrances before you buy, to add an extra dimension to virtual reality environments and even to assist military doctors treating soldiers remotely by recreating bile, blood or urine odours that might help a diagnosis.

See the full article.

RSS 2006: Workshop on Socially Assistive Robotics

Research into Human-Robot Interaction (HRI) for socially assistive applications is still in its infancy. Various systems have been built for different user groups. For example, for the elderly, robot-pet companions aiming to reduce stress and depression have been developed, for people with physical impairments, assistive devices such as wheelchairs and robot manipulators have been designed, for people in rehabilitation therapy, therapist robots that assist, encourage and socially interact with patients have been tested, for people with cognitive disorders, many applications focused on robots that can therapeuticaally interact with children with autism have been done, and for students, tutoring applications have been implemented. An ideal assistive robot should feature sufficiently complex cognitive and social skills permitting it to understand and interact with its environment, to exhibit social behaviors, and to focus its attention and communicate with people toward helping them achieve their goals.

The objectives of this workshop are to present the grand challenge of socially assistive robotics, the current state-of-the-art, and recent progress on key problems. Speakers at the workshop will address a variety of multidisciplinary topics, including social behavior and interaction, human-robot communication, task learning, psychological implications, and others. The workshop will also cover a variety of assistive applications, based on hands-off and hands-on therapies for helping people in need of assistance as part of convalescence, rehabilitation, education, training, and ageing. The proposed workshop is aimed at providing a general overview of the critical issues and key points in building effective, acceptable and reliable human-robot interaction systems for socially assistive applications and providing indications for further directions and developments in the field, based on the diverse expertise of the participants.

Yu-Chun & Zhen-Yu, check this out. -Bob

Wednesday, June 28, 2006


The essential qualities of a laser can be mimicked by classical mechanics -- as opposed to quantum mechanics -- using sound instead of light, according to researchers at the University of Illinois at Urbana-Champaign and the University of Missouri at Rolla, who have built an ultrasound analogue of the laser. The device, called a uaser (pronounced WAY-zer), for "ultrasound amplification by stimulated emission of radiation," produces ultrasonic waves that are coherent and of one frequency, and could be used to study laser dynamics and detect subtle changes, such as phase changes, in modern materials, researchers say. Read more: the link


Web services, acting independently of mobile devices' operating systems, may enable users to access desktop applications via mobile devices, researchers say in "IEEE Internet Computing" (v. 10, no. 3), eliminating cross-platform integration problems through wireless portal networks, wireless extended Internet, or peer-to-peer networks. Mobile agents, autonomous programs that gather information or accomplishes tasks without human interaction, are deployed in handheld devices in one of two ways, according to researchers: on platforms that allow mobile agents to run on them directly; or on devices that can access and use remote mobile agents running on wired networks. The former method allows local execution, useful for high-end devices, especially when the network connection is unreliable, researchers say, while the latter method is beneficial for devices with limited processing power and memory. Read more: the link


New technology based on Wi-Fi networks allows eye doctors to interview and examine patients in five remote clinics via high-quality video conferencing, according to the technology's developers at University of California, Berkeley, and Intel Corporation. The low-cost connectivity links rural clinics with doctors at Aravind Eye Hospital in southern India, researchers say, using high-speed links to screen patients. Standard Wi-Fi range is limited to about 200 feet, according to the system's developers, who created software to overcome range limitations, combined with directional antennas and routers to send, receive and relay signals at network speeds of up to six Megabytes per second at distances up to 40 miles (100 times faster than dial-up speeds, and 100 times farther than regular Wi-Fi). Read more:

Monday, June 26, 2006

Lab meeitng 28 June, 2006 (Casey): Incremental learning of object detectors using a visual shape alphabet

Incremental learning of object detectors using a visual shape alphabet, by Opelt, Pinz, and Zisserman.

Here is the abstract from the paper:
We address the problem of multiclass object detection. Our aims are to enable models for new categories to ben- efit from the detectors built previously for other categories, and for the complexity of the multiclass system to grow sub-linearly with the number of categories. To this end we intro- duce a visual alphabet representation which can be learnt incrementally, and explicitly shares boundary fragments (contours) and spatial configurations (relation to centroid) across object categories. We develop a learning algorithm with the following novel contributions: (i ) AdaBoost is adapted to learn jointly, based on shape features; (ii) a new learning sched- ule enables incremental additions of new categories; and (iii) the algorithm learns to detect objects (instead of cate- gorizing images). Frthermore, we show that category sim- ilarities can be predicted from the alphabet. We obtain excellent experimental results on a variety of complex categories over several visual aspects. We show that the sharing of shape features not only reduces the num- ber of features required per category, but also often im- proves recognition performance, as compared to individual detectors which are trained on a per-class basis.

Lab meeitng 28 June, 2006 (Any): Simultaneous Localization and Mapping with Environmental Structure Prediction

Author: H. Jacky Chang, C. S. George Lee, Yung-Hsiang Lu and Y. Charlie Hu
From: ICRA 2006
Local Copy:

Traditionally, the SLAM problem solves the localization and mapping problem in explored and sensed regions. This paper presents a prediction-based SLAM algorithm (called P-SLAM), which has an environmental structure predictor to predict the structure inside an unexplored region (i.e., lookahead mapping). The prediction process is based on the observation of the surroundings of an unexplored region and comparing it with the built map of explored regions. If a similar structure is matched in the map of explored regions, a hypothesis is generated to indicate that a similar structure has been explored before. If the environment has repeated structures, the mobile robot can utilize the predicted structure as a virtual mapping, and decide whether or not to explore the unexplored region to save exploration time. If the mobile robot decides to explore the unexplored region, a correct prediction can be utilized to localize the robot and speed up the SLAM process. We also derive the Bayesian formulation of P-SLAM to show its compact recursive form for real-time operation. We have experimentally implemented the proposed P-SLAM in a Pioneer 3-DX mobile robot using a Rao-Blackwellized particle filter in real-time. Computer simulations and experimental results validated the performance of the proposed P-SLAM and its effectiveness in an indoor environment.

Lab meeitng 28 June, 2006 (Bob): Putting Objects in Perspective

In addition to the talks presented by Casey and Any, I will present the CVPR 2006 best paper, Putting Objects in Perspective by Derek Hoiem, Alex Efros and Martial Hebert, this Wednesday.

Friday, June 23, 2006

Yukuan's Blog on a book:'On Intelligence'

link to the origin


有趣的是,作者就是那位發明 PalmPilot 的 Jeff Hawkins;更讓人意外的,這本書還有 WatsonKandel 這兩位諾貝爾生醫獎得主推薦。

中譯《創智慧》,原文《On Intelligence》,作者 Hawkins 提出一套大腦新皮質(neocortex)運作的理論:

Hawkins 很賞識 Venon Mountastle (1978) 發表的〈An Organizing Principle for Cerebral Function〉,並推崇該論文為神經科學的 Rosetta Stone 。Mountastle 論文中總結說:整個皮質有著共同的功能、共同的計算規則,並且這些是由整個皮質各區域來執行。「視覺跟聽覺沒有差別,聽覺又跟動作的產生沒有差別。」 我們的基因決定大腦的區域間該怎麼連結,這在功能上或物種間有很大的特異性,但是皮質組織本身無論在何處都在做同樣的事。[page 81]

「百步算則」(one hundred-step rule)[page 99] 告訴我們,人們在一張照片辨認一隻貓的不到半秒的時間內,訊息只可能走了一百個神經元的長度而已(每個神經元要耗掉 5 ms)。而現代的數位電腦需要幾十億的步驟才能做完這件事 。造成這巨大差異最主要的原因是「人腦並不去『計算』問題的答案,它是將答案從儲存處提取出來」。

Hawkins 強調,大腦不是電腦,而是個記憶系統,用來儲存經驗。這些經驗是種途徑,可以反映世界的真實結構,記憶一連串的事件和它們錯雜的關係,然後利用「記憶」來進行「預測」。

這就是「記憶-預測系統」(memory-prediction system)。記憶-預測系統是智慧(intelligence)、洞察(perception)、創造(creativity),及事件知覺(even consciousness)的基礎。


The 2006 World Cup: A Security Nightmare?

You bet it is, and -- not surprisingly -- Germany is leaving little to chance. It's the first World Cup tournament to use RFID technology to identify ticket holders, and it's not likely to be the last. From a closed-circuit television system so powerful that security personnel can zoom in and read the game program in a spectator's hand to tap-proof digital terrestrial trunked radio phones, GPS tracking, and chemical sprays for detecting whether rail tracks have been tampered with, it's a whole new game. Read IEEE Spectrum Online's exclusive report at

A Half-Century's Progress and a Look Ahead

The phrase artificial intelligence first surfaced 50 years ago, around the time of Dartmouth College's Summer Workshop on Artificial Intelligence, in Hanover, N.H. To celebrate the anniversary, the May/June edition of the IEEE Computer Society's Intelligent Systems magazine is a special issue devoted to that first workshop, the field's development since then, and AI researchers' visions for the next 50 years. Find out more at

Thursday, June 22, 2006


“Outdoor SLAM using Visual Appearance and Laser Ranging” by Paul Newman, David Cole, and Kin Ho, University of Oxford.

* “A Rao-Blackwellized Particle Filter for Topological Mapping” by Ananth Ranganathan and Frank Dellaert, Georgia Institute of Technology;
* “Development of a New Humanoid Robot WABIAN-2” by Yu Ogura, Hideki Kondo, Akitoshi Morishima, Hun-ok Lim, and Atsuo Takanishi, Waseda University
* “Blades: A New Class of Geometric Primitives for Feeding 3D Parts on Vibratory Tracks” by Onno Goemans1, Ken Goldberg2, and A. Frank van der Stappen1
1Utrecht University, 2University of California at Berkeley.

ICRA BEST VISION PAPER AWARD (Sponsored by Ben Wegbreit)
“Depth Perception in an Anthropomorphic Robot that Replicates Human Eye Movements” by Fabrizio Santini and Michele Rucci, Boston University

* “CMOS+FPGA Vision System for Visual Feedback of Mechanical Systems” by Kazuhiro Shimizu and Shinichi Hirai, Ritsumeikan University
“Attenuating Pixel-Locking in Stereo Vision via Affine Window Adaptation” by Andrew Stein1, Andres Huertas2, and Larry Matthies2
1Robotics Institute, Carnegie Mellon University, 2Jet Propulsion Laboratory

“Motion Planning for Robotic Manipulation of Deformable Linear Objects” Mitul Saha1 and Pekka Isto2, 1Stanford University, 2University of Vassa, Finland.

“Scalable Shape Sculpting via Hole Motion: Motion Planning in Lattice-Constrained Modular Robots” by Michael De Rosa1, Seth Goldstein1, Peter Lee1, Jason Campbell2, and Padmanabhan Pillai2
1Carnegie Mellon University, 2Intel Research Pittsburgh;
“Programmable Central Pattern Generators: an Application to Biped Locomotion Control” by Ludovic Righetti and Auke Ijspeert, EPFL, Switzerland

News: Robotic amphibian

The robot is more than just lovable. With six rotating flippers, three on
each side of its boxy metal carapace, this machine is amphibious,
capable of both walking and swimming-an attribute that is unique in the
robot world. It's designed to explore aquatic environments, and it may
help save endangered coral reefs, too.

See "Gone Swimmin'," by Michelle Theberge amd Gregory Dudek:


Mars rovers will soon get software upgrades that allow them to scan through data they have collected and send only the most significant data back to Earth, maximizing the scientific return from the missions, NASA says. New algorithms will give the robots' computers the ability to search through their images to find those that feature specific, usually transient, phenomena such as clouds and dust devils, NASA says, a capability that the agency plans to add to future robotic craft, since almost all instruments can collect more information than can be beamed back to Earthbound scientists. NASA's Mars Odyssey orbiter, which has been mapping the Red Planet since 2001, will get new autonomous flight software later this year, the agency says, giving the satellite the ability to react to sudden changes on the Martian surface. Read more:

Wednesday, June 21, 2006

News: Microsoft & Robotics

“Microsoft views robotics as an exciting new market poised for growth,” said Tandy Trower, general manager at Microsoft Corp. “As a premier business development event in intelligent systems and mobiles robotics, the RoboBusiness Conference and Exposition is a great venue for Microsoft to talk about the emerging robotics market.”

Microsoft Robotics Initiative: A Technical Introduction
Joseph Fernando, Architect & Program Manager, Microsoft
Tuesday June 20th 2-4pm
This session, featuring presentations from Microsoft and early-adopter third-party companies, will provide a technical introduction to Microsoft’s technologies and how they can be used to develop robotics applications. It is geared to both programmers and non-programmers who wish to gain insight into how Microsoft can make developing robotic applications easier.


PITTSBURGH—Researchers at Carnegie Mellon University's Robotics Institute are creating the new Center for Innovative Robotics, a resource that will help make robotics accessible to a broader range of individuals and businesses.

"One of the goals of the center will be to promote interoperability between many types of robots and a variety of software, including use of the Internet for controlling robots," said Illah Nourbakhsh, associate professor of robotics and director of the new center.
The center, established with financial support from the Microsoft Robotics Group, will operate a Web site,, where academics, students, commercial inventors and enthusiasts can share the ideas, technologies and software that are critical to robot development. It will utilize Microsoft's new Robotics Studio, a set of software tools designed to easily create robotics applications across a wide variety of hardware and scenarios. For more information on Microsoft Robotics Studio, see

"Microsoft is proud to help Carnegie Mellon establish this new center and online community," said Tandy Trower, general manager of the Microsoft Robotics Group. "Carnegie Mellon's new Center for Innovative Robotics, together with the launch of our new Robotics Studio development environment, will help broaden the reach of robotics for hobbyists, students and professors, as well as commercial developers, across a wide variety of hardware and scenarios."

Tuesday, June 20, 2006

Demo of Frontal Face Alignment

This is a cool demo. You guys should check out their CVPR2006 paper.


Monday, June 19, 2006

PAL lab meeting 21 June, 2006 (Vincent): Multiclass Object Recognition with Sparse, Localized Features.

Author : Jim Mutch and David G. Lowe @
This paper appears in CVPR'06.

Abstract :

We apply a biologically inspired model of visual object
recognition to the multiclass object categorization problem.
Our model modifies that of Serre, Wolf, and Poggio. As in
that work, we first apply Gabor filters at all positions and
scales; feature complexity and position/scale invariance are
then built up by alternating template matching and max
pooling operations. We refine the approach in several biologically
plausible ways, using simple versions of sparsification
and lateral inhibition. We demonstrate the value of
retaining some position and scale information above the intermediate
feature level. Using feature selection we arrive
at a model that performs better with fewer features. Our
final model is tested on the Caltech 101 object categories
and the UIUC car localization task, in both cases achieving
state-of-the-art performance. The results strengthen the
case for using this class of model in computer vision.

Here you can find the file.

PAL lab meeting 21 June, 2006 (Stanley): A Greedy Strategy for Tracking a Locally Predictable Target among Obstacles

Author: Tirthankar Bandyopadhyay, Yuanping Li, Marcelo H. Ang Jr., and David Hsu

From: Proceedings of the 2006 IEEE International Conference on Robotics and Automation

Abstract: Target tracking among obstacles is an interesting class of motion planning problems that combine the usual motion constraints with robot sensors’ visibility constraints. In this paper, we introduce the notion of vantage time and use it to formulate a risk function that evaluates the robot’s advantage in maintaining the visibility constraint against the target. Local minimization of the risk function leads to a greedy tracking strategy. We also use simple velocity prediction on the target to further improve tracking performance. We compared our newstrategy with earlier work in extensive simulation experimentsand obtained much improved results.


Friday, June 16, 2006

Lab meeting June 21 (Wed)

Date: June 21
Time: 10:30 AM ~ 12:30 PM
Place: CSIE R524
Speakers: Stanley & Vincent

Thursday, June 15, 2006

News: (Report) Upside-Down Sensors Doomed Craft

By Irene Klotz, Discovery News

June 14, 2006 — Gravity sensors were installed upside-down aboard a NASA spacecraft, dooming the probe to a 193-mph crash landing in the Utah desert, accident investigators said in a report released Tuesday.

"In the wrong orientation, it was impossible for the sensors to detect atmospheric entry," investigators wrote in their report.

The spacecraft was returning samples of solar wind when it smashed to the ground on Sept. 8, 2004.

the full article

DARPA Urban Challenge.

DARPA announces Urban Challenge. Teams will compete to build an autonomous vehicle able to complete a 60-mile urban course safely in less than 6 hours.

DARPA will award prizes for the top three autonomous ground vehicles that compete in a final event where they must safely complete a 60-mile urban area course in fewer than six hours. First prize is $2 million, second prize is $500,000 and third prize is $250,000. To succeed, vehicles must autonomously obey traffic laws while merging into moving traffic, navigating traffic circles, negotiating busy intersections and avoiding obstacles.

More information:

Robots on Mars: Special Issue of "IEEE Robotics & Automation"

The June issue of the "IEEE Robotics & Automation Magazine" (v. 13, no. 2) features nine articles on the recent Mars Exploration Rover (MER) mission and its vehicles, "Spirit" and "Opportunity," which successfully landed on the red planet in early 2004. "I truly believe that this June issue is a blast!" writes Stefano Stramigioli, editor of the magazine. The issue, guest edited by Ashitey Trebi-Ollennu of the NASA-Jet Propulsion Laboratory, contains articles such as "Mars Exploration Rover Mobility Development" and "Working the Martian Night Shift." The table of contents and abstracts for all papers are available through the IEEE Xplore digital library, where subscribers may access the full text of the issue:

Sunday, June 11, 2006

PAL lab meeting 14 June, 2006 (Chihao): Speaker Attention System for Mobile Robots Using Microphone Array and Face Tracking

From 2006 IEEE International Conference on Robotics and Automation

Author: Kai-Tai Song, Jwu-Sheng Hu, Chi-Yi Tsai, Chung-Min Chou, Chieh-Cheng Cheng, Wei-Han Liu, and Chia-Hsing Yang

This paper presents a real-time human-robot interface system (HRIS), which processes both speech and vision information to improve the quality of communication between human and an autonomous mobile robot. The HRIS contains a real-time speech attention system and a real-time face tracking system. In the speech attention system, a microphone-array voice acquisition system has been developed to estimate the direction of speaker and purify the speaker’s speech signal in a noisy environment. The developed face tracking system aims to track the speaker’s face under illumination variation and react to the face motion. The proposed HRIS can provide a robot with the abilities of finding a speaker’s direction, tracking the speaker’s face, moving its body to the speaker, focusing its attention to the speaker who is talking to it, and purifying the speaker’s speech. The experimental results show that the HRIS not only purifies speech signal with a significant performance, but also tracks a face under illumination variation in real-time.

Shao-Wen (Any) Yang had sent account/password to our lab (subject: "Access ICRA'06 papers via WWW").

PAL lab meeting 14 June, 2006 (Eric):An approach to visual servoing based on coded light

Jordi Pages , Christophe Collewet , Francois Chaumette , Joaquim Salvi

Paper from Proceedings of the 2006 IEEE International Conference on
Robotics and Automation

Positioning a robot with respect to objects by usingdata provided by a camera is a well known technique calledvisual servoing. In order to perform a task, the object mustexhibit visual features which can be extracted from differentpoints of view. Then, visual servoing is object-dependent as itdepends on the object appearance. Therefore, performing thepositioning task is not possible in presence of non-textured objetsor objets for which extracting visual features is too complex ortoo costly. This paper proposes a solution to tackle this limitationinherent to the current visual servoing techniques. Our proposalis based on the coded structured light approach as a reliable andfast way to solve the correspondence problem. In this case, acoded light pattern is projected providing robust visual featuresindependently of the object appearance.

(please check out your e-mail about "Access ICRA'06 papers via WWW" to get 'username' and 'password')

Saturday, June 10, 2006

Lab meeting this week

Date : June 14, Wednesday
Time: 10:30 am ~12:30 pm
Place: CSIE R524
Speakers: Eric & ChiHao
(Nelson will shortly explain incremental RANSAC)

CVPR 2006 Program

IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2006)
New York, NY: June 17-22, 2006

A draft program booklet is available.

Dear all,

Take a look at the program and let's discuss some interesting papers.


Wednesday, June 07, 2006

VASC seminar : Removing Camera Shake from a Single Photograph

Speaker : Rob Fergus from MIT


Camera shake during exposure leads to objectionable image blur and ruins
many photographs. Conventional blind deconvolution methods typically
assume frequency domain constraints on images, or overly simplied
parametric forms for the motion path during camera shake. Real camera
motions can follow convoluted paths, and a spatial domain prior can better
maintain visually salient image characteristics. We introduce a method to
remove the effects of camera shake from seriously blurred images. The
method assumes a uniform camera blur over the image, negligible in-plane
camera rotation, and no blur due to moving objects in the scene. The user
must specify an image region without saturation effects. I'll discuss
issues in this blind deconvolution problem, and show results for a variety
of digital photographs.

Invitation: I invite audience members to submit a few examples of
motion-blurred photographs to me a few days before the talk. I'll show
the examples and our algorithm's output on these examples during the
talk. Make sure that the images have blur due to camera motion, rather
than just being out-of-focus. If you have a favorite blind deconvolution
algorithm, you can also send me that algorithm's result and I'll show that

Joint work with Aaron Hertzmann, Bill Freeman, Sam Roweis, and
Barun Singh.

Short Bio:

Dr. Rob Fergus is currently a post-doc with Prof. William Freeman at MIT
in the Computer Science and Artificial Intelligence Lab (CSAIL). He
recently graduated from Prof. Andrew Zisserman's group at Oxford, where
he collaborated closely with Prof. Pietro Perona at Caltech. Rob's
research is within the field of Computer Vision; more specifically his
interests include: probabilistic models for object category recognition;
methods for learning from noisy data and efficient computational methods
within vision.

PAL lab meeting 8 June, 2006 (Nelson) : Incremental RANSAC for Online Relocation in Large Dynamic Environments

Incremental RANSAC for Online Relocation in Large Dynamic Environments

Kanji Tanaka Eiji Kondo
Graduate School of Engineering
Kyushu University


Vehicle relocation is the problem in which a mobile
robot has to estimate the self-position with respect to an a
priori map of landmarks using the perception and the motion
measurements without using any knowledge of the initial selfposition.
Recently, RANdom SAmple Consensus (RANSAC), a
robust multi-hypothesis estimator, has been successfully applied
to offline relocation in static environments. On the other hand,
online relocation in dynamic environments is still a difficult
problem, for available computation time is always limited, and
for measurement include many outliers. To realize real time
algorithm for such an online process, we have developed an
incremental version of RANSAC algorithm by extending an
efficient preemption RANSAC scheme. This novel scheme named
incremental RANSAC is able to find inlier hypotheses of selfpositions
out of large number of outlier hypotheses contaminated
by outlier measurements.

Link for downloading the pdf file.

MIT CSAIL talk : Head-pose and Illumination Invariant 3-D Audio-Visual Speech Recognition

Speaker: Dimitri Bitouk , Johns Hopkins University
Date: Tuesday, June 6 2006
Time: 1:00PM to 2:00PM
Location: 32-346
Host: Karen Livescu,
CSAILContact: Karen Livescu, 617-253-5953,


Speech perception is bimodal, employing not only the acoustic signal, but also visual cues. Audio-visual speech recognition aims to improve the performance of conventional automated speech recognition by incorporating visual information. Due to a fundamentally limited two-dimensional representation employed, current approaches for visual feature extraction lack invariance to speaker's pose and illumination in the environment. The research presented in this thesis aims to develop three-dimensional methods for visual feature extraction that alleviate the above-mentioned limitation. Following the concepts of Grenander's General Pattern Theory, the prior knowledge of speaker's face is described by a prototype, which consists of a 3-D surface and a texture. The variability in observed video images of a speaker associated with pose, articulatory facial motion, and illumination is represented by transformations acting on the prototype and forming the group of geometric and photometric variability. Facial motion is described as smooth deformations of the prototype surface and is learned from motion capture data. The effects of illumination are accommodated by analytically constructing surface scalar fields that express relative changes in the face surface irradiance. We derive a multi-resolution tracking algorithm for estimation of speaker's pose, articulatory facial motion and illumination from uncalibrated monocular video sequences. The inferred facial motion parameters are utilized as visual features in audio-visual speech recognition. An application of our approach to large-vocabulary audio-visual speech recognition is presented. Speaker-independent speech recognition combines audio and visual models trained at the utterance level. We demonstrate that the visual features derived using our 3-D approach significantly improve speech recognition performance across a wide range of acoustic noise signal-to-noise ratios.

Tuesday, June 06, 2006

PAL lab meeting 8 June, 2006 (Bright): People Tracking and Following with Mobile Robot Using an Omnidirectional Camera and a Laser

Paper from Proceedings of the 2006 IEEE International Conference on Robotics and Automation

The paper presents two different methods for mobile robot tracking and following of a fast-moving person in outdoor unstructured and possibly dynamic environment. The
robot is equipped with laser range-finder and omnidirectional camera. The first method is based on visual tracking only and while it works well at slow speeds and controlled conditions, its performance quickly degrades as conditions become more
difficult. The second method which uses the laser and the camera in conjunction for tracking performs well in dynamic and cluttered outdoor environments as long as the target occlusions and losses are temporary. Experimental results and analysis are
presented for the second approach.


DSpace at MIT: Infrastructure for Engineered Emergence on Sensor/Actuator Networks

Authors: Beal, Jacob, Bachrach, Jonathan
Advisors: Gerald Sussman
Other Contributors: Mathematics and Computation
Keywords: amorphous computing distributed sensor networks space-time programming
Issue Date: 1-Jun-2006
Citation: IEEE Intelligent Systems, (Vol. 21, No. 2) pp. 10-19, March/April 2006.
Series/Report no.: Massachusetts Institute of Technology Computer Science and Artificial Intelligence Laboratory
Abstract: The ability to control emergent phenomena depends on decomposingthem into aspects susceptible to independent engineering. Forspatial self-managing systems, the amorphous-medium abstraction lets youseparate the system’s specification from its implementation.

Appears in Collections: CSAIL Technical Reports (July 1, 2003 - present)

Saturday, June 03, 2006

VASC Seminar - Visual surveillance -- movement modeling and event recognition

A very special VASC Seminar this Thursday, 1pm, 3305 NSH.

VASC Seminar Series

Larry Davis
U Maryland

Visual surveillance -- movement modeling and event recognition

This talk will cover two topics central to surveillance under
investigation at the University of Maryland:
1. Movement modeling of simple human hand and leg movements, and
2. Event modeling and recognition of human interactions.

We first consider the problem of recognizing human actions commonly observed during surveillance. The term actions refers to movements in which the subject has a reason and a viable execution plan. The focus is on actions with substantial limb movements and fairly simple structure - examples include reaching out, striking, etc. Computer vision research on recognizing movements has focused on appearance or position-based approaches. Commonly observed movements like reaching, striking, waving, etc., have highly variable target locations - a person can reach above
his/her head, for something on the floor, etc. Predictably, change in target location leads to change in the trajectory followed by the hands and other body parts during the movement. This puts appearance-based techniques at a disadvantage. Either a large number of training examples are needed or specialized models must be trained. There are factors common to reach movements that are independent of the target's location. Psychological studies indicate that one of these factors is the manner in which forces are applied to the hands during these movements. This
manifests itself in the velocity profiles of the hands during the movements. Our approach exploits constraints on the velocity profile to recognize common human movements like reach, strike, etc.

We then consider the problem of representing and recognizing interactions between people, places and objects that can occur over indeterminate periods of time. We describe an approach based on multi-valued default logic that augments purely appearance based analysis of people, vehicles and objects with common sense knowledge about possession, knowledge and closed worlds. Examples from multi-camera surveillance will be provided.

VASC seminar : 2D and 3D Face Alignment

Author : Leon Gu

Leon Gu is a fourth year Ph.D. student in computer science department. He received his M.S. from Peking University and his B.S. from Xi'an JiaotongUniversity, both in computer science. He worked in Microsoft Research Asiaas an associate researcher between 1999 and 2002. He is advised by TakeoKanade since 2002. His research interest involves visual object localization and recognition.


A good algorithm for recognizing the parts of objects is characterized by three features: automatic, efficient, and accurate on noised, unseeninstances. Typical works in this field are good on one of these features,but short on the others. This talk will focus on face alignment, i.e.,
deform a model to identify the detailed facial features. I will describe our recent work on 2D and 3D face alignment, which has been used in a number of projects within and outside of CMU. I will describe the major differences between our method and previous methods, and explain the intuition behind them.

Here is the link for details.

Friday, June 02, 2006

PAL Lab Meeting Schedule Change

Because of a schedule conflict, we will meet at 10:30 AM, June 8 (Thursday).

Thursday, June 01, 2006

PAL lab meeting 2 June, 2006 (Any): RRT-blossom: RRT with a local flood-fill behavior

Maciej Kalisiak, Dept. of Computer Science, University of Toronto
Michiel van de Panne, Dept. of Computer Science, University of British Columbia

Paper from Proceedings of the 2006 IEEE International Conference on Robotics and Automation

This paper proposes a new variation of the RRT planner which demonstrates good performance on both looselyconstrained and highly-constrained environments. The key to the planner is an implicit flood-fill-like mechanism, a technique that is well suited to escaping local minima in highly constrained problems. We show sample results for a variety of problems and environments, and discuss future improvements.

Check out our FTP server for a local copy of the paper.

PAL lab meeting 2 June, 2006 (Jim): A Unified Framework for Nearby and Distant Landmarks in Bearing-Only SLAM

Authors: Nikolas Trawny and Stergios I. Roumeliotis

Abstract: Bearing-Only SLAM describes the process of simultaneously localizing a mobile robot while building a map of the unknown surroundings, using bearing measurements to landmarks as the only available exteroceptive sensor information. Commonly, the position of map features is estimated along with the robot pose. However, consistent initialization of these positions is a difficult problem in Bearing-Only SLAM, in particular for distant landmarks. In previous approaches, measurements to remote landmarks often had to be discarded, thus losing valuable orientation information. In this paper, we present for the first time a unifying framework allowing for non-delayed initialization of both nearby and distant features. This is made possible by a four-element landmark parametrization, combined
with a constraint-based inferred measurement.

To download paper, get ICRA 2006 CD's 1159.pdf.

IEEE Tech Alert: Mobile wireless record

The launch of third-generation cellphone systems nearly five years ago promised to transform the speech-and-message handset into an exciting multimedia tool. It is a promise largely unfulfilled. But network operators are looking ahead -- admittedly rather far ahead -- to get things right with next-generation (4G) technology: it will be an all-packet service that integrates voice and data transmitted at high speeds and capacities. With an eye to 4G, Japan's largest mobile phone company, NTT DoCoMo Inc., has successfully transmitted 2.5 Gb/s of packet data in a downlink to a vehicle moving at 20 kilometers per hour. Though the equipment in the vehicle is about the size of a refrigerator, at least the principle is demonstrated.

See "Sneak Peek at Cellphone Future," by John Boyd:

CFR Seminar: Integrated Planning and Control for Convex-bodied

Center for the Foundations of Robotics Seminar, May 31, 2006

Title:Integrated Planning and Control for Convex-bodied Nonholonomic Systems Using Local Feedback Control Policies

David C. Conner

Time and Place: 5:00pm NSH 1507

We present a technique for controlling a wheeled mobile robot as it moves ina cluttered environment. The method defines a hybrid control policy that simultaneously addresses the navigation and control problem for aconvex-bodied wheeled mobile robot navigating amongst obstacles. Thetechnique uses parameterized continuous local feedback control policies thatensure safe operation over local regions of the free configuration space;each local policy is designed to respect nonholonomic constraints, bounds onvelocities (inputs), and obstacles in the environment. The hybrid controlpolicy makes use of a collection of these local control policies in concertwith discrete planning tools. This approach allows the system to plan, andre-plan in the face of changing conditions, while preserving the safety andconvergence guarantees of the underlying control policies.The first half of the presentation describes the development of the localcontrol policies for constrained systems. First, we define policyrequirements that all local control policies must satisfy. Next, genericpolicies that meet the policy requirements are developed. These policiesoffer guaranteed performance and are amenable to composition within thehybrid control framework.The second half of the presentation deals with discrete planning within thespace of deployed control policies. The continuous closed-loop dynamicsinduced by the local policies may be represented as a graph of discretetransitions. Two methods for planning on this graph will be discussed.Traditional AI planning (A* and D*) can order the graph to generate aswitching strategy that solves a single navigation problem over the union ofpolicy domains. Experimental results using this first technique will beshown. The second planning strategy uses model checking techniques tospecify sequences of policies that address higher level planning problemswith temporal specifications. This second approach is validated insimulation. The presentation will give an overview of these two approaches,and discuss the relative strengths and weaknesses of the approaches. Thepresentation concludes with suggestions for possible lines of research thatcombine the strengths of the two approaches, thereby making it feasible todo robust high-level behavioral planning for the systems subject to complexinteracting constraints.