Robot Perception and Learning: 2006

Saturday, December 30, 2006

News: Robot, My Slippers Please

Home sensors, long-distance health monitors and other gadgets help seniors remain independent.

May 2006

RI-MAN isn’t your average caregiver. The pale-green, 220-pound robot is a mass of wiring, metal and computer chips. It was created in Japan as an eventual high-tech alternative to costly home-health services and nursing-home care.

Although you can’t order your own RI-MAN or other home-care robot yet, you can buy many other assistive-technology devices that enable older adults with various ailments to continue to live in their own homes. Such devices include home sensors that monitor a person’s day-to-day activities and special goggles that help the visually impaired to see. These products are part of tech companies’ response to the new demographics: a rising number of seniors, families scattered around the globe and grown children with full-time careers who care for elderly parents. Here are some examples of what’s available now.

See the full article.

Thursday, December 28, 2006

Lab meeting 28 Dec, 2006 (Jim): Unified Inverse Depth Parametrization for Monocular SLAM

Unified Inverse Depth Parametrization for Monocular SLAM
Montiel etal., RSS 2006

PDF
A.J.Davison's website

Abstract:
Recent work has shown that the probabilistic SLAM approach of explicit uncertainty propagation can succeed in permitting repeatable 3D real-time localization and mapping even in the ‘pure vision’ domain of a single agile camera with no extra sensing. An issue which has caused difficulty in monocular SLAM however is the initialization of features, since information from multiple images acquired during motion must be combined to achieve accurate depth estimates. This has led algorithms to deviate from the desirable Gaussian uncertainty representation of the EKF and related probabilistic filters during special initialization steps.
In this paper we present a new unified parametrization for point features within monocular SLAM which permits efficient and accurate representation of uncertainty during undelayed initialisation and beyond, all within the standard EKF (Extended Kalman Filter). The key concept is direct parametrization of inverse depth, where there is a high degree of linearity. Importantly, our parametrization can cope with features which are so far from the camera that they present little parallax during motion, maintaining sufficient representative uncertainty that these points retain the opportunity to ‘come in’ from infinity if the camera makes larger movements. We demonstrate the parametrization using real image sequences of large-scale indoor and outdoor scenes.

Wednesday, December 27, 2006

Lab meeting 28 Dec, 2006 (Any): Sonar Sensor Interpretation

Title: Sonar Interpretation Learned from Laser Data
Authors: S. Enderle, G. Kraetzschmar, S. Sablatnog and G. Palm
From: 1999 Third European Workshop on Advanced Mobile Robots, 1999. (Eurobot '99)
Links: [Paper 1][Paper 2][Paper 3]
Abstract:
Sensor interpretation in mobile robots often involves an inverse sensor model, which generates hypotheses on specific aspects of the robot's environment based on current sensor data. Building inverse sensor models for sonar sensor assemblies is a particularly difficult problem that has received much attention in past years. A common solution is to train neural networks using supervised learning. However; large amounts of training data are typically needed, consisting e.g. of scans of recorded sonar data which are labeled with manually constructed teacher maps. Obtaining these training data is an error-prone and time-consuming process. We suggest that it can be avoided, if an additional sensor like a laser scanner is also available which can act as the feeding signal. We have successfully trained inverse sensor models for sonar interpretation using laser scan data. In this paper; we describe the procedure we used and the results we obtained.

Lab meeting 28 Dec, 2006 (Leo): Square Root SAM

Square Root SAM: Simultaneous Localization and Mapping via Square Root Information Smoothing

Frank Dellaert

Robotics: Science and Systems, 2005

Abstract— Solving the SLAM problem is one way to enable
a robot to explore, map, and navigate in a previously unknown
environment. We investigate smoothing approaches as a viable
alternative to extended Kalman filter-based solutions to the
problem. In particular, we look at approaches that factorize either
the associated information matrix or the measurement matrix
into square root form. Such techniques have several significant
advantages over the EKF: they are faster yet exact, they can be
used in either batch or incremental mode, are better equipped
to deal with non-linear process and measurement models, and
yield the entire robot trajectory, at lower cost. In addition,
in an indirect but dramatic way, column ordering heuristics
automatically exploit the locality inherent in the geographic
nature of the SLAM problem.
In this paper we present the theory underlying these methods,
an interpretation of factorization in terms of the graphical model
associated with the SLAM problem, and simulation results that
underscore the potential of these methods for use in practice.

[Link]

Thursday, December 14, 2006

Lab meeting 15 Dec, 2006 (Casey): Estimating 3D Hand Pose from a Cluttered Image

Title: Estimating 3D Hand Pose from a Cluttered Image
Authors: Vassilis Athitsos and Stan Scalaroff
(CVPR 2003)

Abstract:
A method is proposed that can generate a ranked list of
plausible three-dimensional hand configurations that best
match an input image. Hand pose estimation is formulated
as an image database indexing problem, where the closest
matches for an input hand image are retrieved from a large
database of synthetic hand images. In contrast to previous
approaches, the system can function in the presence of
clutter, thanks to two novel clutter-tolerant indexing methods.
First, a computationally efficient approximation of
the image-to-model chamfer distance is obtained by embedding
binary edge images into a high-dimensional Euclidean
space. Second, a general-purpose, probabilistic line matching
method identifies those line segment correspondences
between model and input images that are the least likely to
have occurred by chance. The performance of this cluttertolerant
approach is demonstrated in quantitative experiments
with hundreds of real hand images.

Paper download: [Link]

Wednesday, December 13, 2006

Lab meeting 15 Dec, 2006 (YuChun): Modeling Affect in Socially Interactive Robots

Author:
Rachel Gockley, Reid Simmons, and Jodi Forlizzi

Proc. of the 15th IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN06), September, 2006.

Abstract:
Humans use expressions of emotion in a very social manner, to convey messages such as “I'm happy to see you” or “I want to be comforted,” and people's long-term relationships depend heavily on shared emotional experiences. We believe that for robots to interact naturally with humans in social situations they should also be able to express emotions in both short-term and long-term relationships. To this end, we have developed an affective model for social robots. This generative model attempts to create natural, human-like affect and includes distinctions between immediate emotional responses, the overall mood of the robot, and long-term attitudes toward each visitor to the robot. This paper presents the general affect model as well as particular details of our implementation of the model on one robot, the Roboceptionist.

[Link]

Friday, December 08, 2006

[Thesis Oral] A Market-Based Framework for Tightly-Coupled Planned Coordination in Multirobot Teams

Author:
Nidhi Kalra
Robotics Institute
Carnegie Mellon University

Abstract:
This thesis explores the coordination challenges posed by real-world multirobot domains that require planned tight coordination between teammates throughout execution. These domains involve solving a multi-agent planning problem in which the actions of robots are tightly coupled. Because of uncertainty in the environment and the team, they also require persistent tight coordination between teammates throughout execution.

This thesis proposes an approach to these problems in which the complexity and strength of the coordination adapt to the difficulty of the problem. Our approach, called Hoplites, is a market-based framework that selectively injects pockets of complex coordination into a primarily distributed system by enabling robots to purchasing each other's participation in tightly-coupled plans over the market. We discuss how it is widely applicable to real-world problems because it is general, computationally feasible, scalable, operates under uncertainty, and improves solutions with new information. Experiments show that our approach significantly outperforms existing coordination methods.

Tuesday, December 05, 2006

Lab meeting 8 Dec, 2006 (Chihao): Particle filtering algorithms for tracking an acoustic source in a reverberant environment

Author:
Ward, D.B. Lehmann, E.A. Williamson, R.C.
Dept. of Electr. & Electron. Eng., Imperial Coll. London, UK

From: Speech and Audio Processing, IEEE Transactions

Abstract:

Traditional acoustic source localization algorithms attempt to find the current location of the acoustic source using data collected at an array of sensors at the current time only. In the presence of strong multipath, these traditional algorithms often erroneously locate a multipath reflection rather than the true source location. A recently proposed approach that appears promising in overcoming this drawback of traditional algorithms, is a state-space approach using particle filtering. In this paper we formulate a general framework for tracking an acoustic source using particle filters. We discuss four specific algorithms that fit within this framework, and demonstrate their performance using both simulated reverberant data and data recorded in a moderately reverberant office room (with a measured reverberation time of 0.39 s). The results indicate that the proposed family of algorithms are able to accurately track a moving source in a moderately reverberant room.

[Link]

Monday, December 04, 2006

Lab meeting 8 Dec, 2006 (AShin): Learning and Inferring Transportation Routines

Author:L. Liao, D. Fox, and H. Kautz

Proc. of the National Conference on Artificial Intelligence (AAAI-04)
Outstanding Paper Award

Abstract
This paper introduces a hierarchical Markov model that can learn and infer a user's daily movements through the community. The model uses multiple levels of abstraction in order to bridge the gap between raw GPS sensor measurements and high level information such as a user's mode of transportation or her goal. We apply Rao-Blackwellised particle filters for efficient inference both at the low level and at the higher levels of the hierarchy. Significant locations such as goals or locations where the user frequently changes mode of transportation are learned from GPS data logs without requiring any manual labeling. We show how to detect abnormal behaviors (\eg\ taking a wrong bus) by concurrently tracking his activities with a trained and a prior model. Experiments show that our model is able to accurately predict the goals of a person and to recognize situations in which the user performs unknown activities.

[Link]

Saturday, December 02, 2006

No Polit, No Problem?

[origional link]

The promise is fantastic: new generations of remote-controlled aircraft could soon be flying in civilian airspace, performing all sorts of useful tasks.The reality is that a lack of radio frequencies to control the planes and serious concerns over their safety are going to keep them grounded for years to come.
Surprisingly, given the commercial hopes it has for civil unmanned aerial vehicles (UAVs), the aviation industry has failed to obtain the radio frequencies it needs to control them - and it will be 2011 before it can even begin to lobby for space on the radio spectrum. What's more, none of the world's aviation authorities will allow civil UAVs to fly in their airspace without a reliable system for avoiding other aircraft - and the industry has not yet even begun developing such a system. Experts say this could take up to seven years.
Dedicated frequencies are handed out at the International Telecommunications Union's World Radiocommunications Conference.But no one in the UAV industry had applied for any new frequencies.If UAVs are to mingle safely with other civilian aircraft, the industry needs to develop a safe, standardised collision avoidance system. This is complicated because aviation regulators demand that if UAVs are to have access to civil airspace, they must be "equivalent" in every way to regular planes.The problem for now is that aviation regulators have yet to define precisely what they mean by "equivalent", so UAV makers are not yet willing to commit themselves to developing collision-avoidance technology."A crewless aircraft on a collision course must behave as if it had a pilot on board"
On the brighter side, last week the UN's International Civil Aviation Organization said its navigation experts would meet in early 2007 to consider regulations for UAVs in civil airspace.
however, it will be meaningless unless the industry can obtain the necessary frequencies to control the planes and feed images and other sensor data back to base, says Bowker. "The lack of robust, secure radio spectrum is a show-stopper."

Thursday, November 30, 2006

Context Aware Computing, Understanding and Responding to Human Intention

[Original Link]

Ted Selker

Abstract

This talk will demonstrate that Artificial intelligence can competently Improve human interaction with systems and even each other in a myriad of natural scenarios. Humans work to understand and react to each others intentions. The context aware computing group at the MIT Media lab has demonstrated that across most aspects of our life, computers can do this too. The groups demonstrations range from car to office kitchen to and even bed. The goal is to show that human intentions can be recognized considered and responded to appropriately by computer systems. Understanding and acting appropriately to intentions requires more than good sensors, it requires understanding of the value of the input. The context aware demonstrations therefore rely completely on models of what the system can do, what the tasks are that can be performed and what is known about the user . These models of system task and user form a central basis for deciding when and how to respond in a specific situation.

Dr. Ted Selker is an Associate Professor at the MIT Media, the Director of the Context Aware Computing Lab, the MIT director of The Voting Technology Project and the Counter Intelligence/ Design Intelligence special interest group on domestic and product-design of the future. Ted's work strives to demonstrate that peoples intentions can be recognized and respected by the things we design. Context aware computing creates a world in which peoples desires and intentions cause computers to help them. This group is recognized for its creating environments that use sensors and artificial intelligence to create so-called "virtual sensors"; adaptive models of users to create keyboard less computer scenarios. Ted's Design Intelligence work has used technology rich platforms such as kitchens to examine intention based design., Ted's work is also applied to developing and testing user experience technology and security architectures for recording and voter intentions securely and accurately.

Prior to joining MIT faculty in November 1999, Ted was an IBM fellow and directed the User Systems Ergonomics Research lab. He has served as a consulting professor at Stanford University, taught at Hampshire, University of Massachusetts at Amherst and Brown Universities and worked at Xerox PARC and Atari Research Labs.

Ted's research has contributed to products ranging from notebook computers to operating systems. His work takes the form of prototype concept products supported by cognitive science research. He is known for the design of the TrackPoint in-keyboard pointing device found in many notebook computers, and many other innovations at IBM. Ted's technologies are often featured in national and international news media.

Ted is work has resulted in award winning products, numerous patents, papers and is often featured by the press. And was co recipient of computer science policy leader awarded for Scientific American 50 in 2004 and the American Association For People with Disabilities Thomas Paine Award for his work on voting technology.

Wednesday, November 29, 2006

Lab meeting 1 Dec, 2006 (ZhenYu):Detecting Social Interaction of Elderly in a Nursing Home Environment

Author:
Datong Chen, Jie Yang, Robert Malkin, and Howard D. Wactlar

Abstract:
Social interaction plays an important role in our daily lives. It is one of the most important indicators of physical or mental changes in aging patients. In this paper, we investigate the problem of detecting social interaction patterns of patients in a skilled nursing facility. Our studies consist of both a “wizard of Oz” study and an experimental study of various sensors and detection models for detecting and summarizing social interactions among aging patients and caregivers. We first simulate plausible sensors using human labeling on top of audio and visual data collected from a skilled nursing facility. The most useful sensors and robust detection models are determined using the simulated sensors. We then present the implementation of some real sensors based on video and audio analysis techniques and evaluate the performance of these implementations in detecting interaction. We conclude the paper with discussions and future work.

Download:[link]

Lab meeting 1 Dec, 2006 (Nelson): Randam Sample Consensus: A Paradigm for Model Fitting with Applications to Image Analysis and Automated Cartography

Martin A. Fischler and Robert C. Bolles
SRI International

Communication of the ACM
June 1981 Volume 24 Number 6

LINK

Abstrct:

A new paradigm, Random Sample Consensus
(RANSAC), for fitting a model to experimental data is
introduced. RANSAC is capable of interpreting/
smoothing data containing a significant percentage of
gross errors, and is thus ideally suited for applications
in automated image analysis where interpretation is
based on the data provided by error-prone feature
detectors. A major portion of this paper describes the
application of RANSAC to the Location Determination
Problem (LDP): Given an image depicting a set of
landmarks with known locations, determine that point
in space from which the image was obtained. In
response to a RANSAC requirement, new results are
derived on the minimum number of landmarks needed
to obtain a solution, and algorithms are presented for
computing these minimum-landmark solutions in closed
form. These results provide the basis for an automatic
system that can solve the LDP under difficult viewing

Monday, November 27, 2006

News: Scientists Try to Make Robots More Human

November 22, 2006

George the robot is playing hide-and-seek with scientist Alan Schultz. What's so impressive about robots playing children's games? For a robot to actually find a place to hide, and then hunt for its human playmate is a new level of human interaction. The machine must take cues from people and behave accordingly.

This is the beginning of a real robot revolution: giving robots some humanity.

"Robots in the human environment, to me that's the final frontier," said Cynthia Breazeal, robotic life group director at the Massachusetts Institute of Technology. "The human environment is as complex as it gets; it pushes the envelope."

Robotics is moving from software and gears operating remotely - Mars, the bottom of the ocean or assembly lines - to finally working with, beside and even on people.

"Robots have to understand people as people," Breazeal said. "Right now, the average robot understands people like a chair: It's something to go around."

See the full article.

Wednesday, November 22, 2006

News: Pleo, the sensitive robot

Behold the majesty of Pleo. It's a robot, coming out in the second quarter of 2007, that exhibits emotional reactions to its surroundings. So far, companion robots have been a big flop in the market, but Pleo maker Ugobe hopes to succeed by pricing it for around $250, lower than other companion robots, and giving consumers ways to program it.

Don't be fooled by the eyes; the vision system is in its nose.

When it's in a good mood, the Pleo wags it tail back and forth. It also makes a sort of mooing sound, which is appropriate given that the robot is modeled after the Camarasaurus, a cow-like dinosaur that roamed the Americas in the Jurassic period. A paleontologist helped Ugobe come up with the design.

There's a lot going on under the Pleo's rubbery skin. The robot contains six microprocessors and more than 150 gears. It also has a memory card slot where the sun don't shine.

Credit: Michael Kanellos/CNET News.com

See the related link & video.

Sunday, November 19, 2006

News: Robot Senses Damage, Learns to Walk Again

November 17, 2006

—It may look like a metallic starfish, but scientists say this robot might have more in common with a newborn human.

The four-legged machine is a prototype "resilient robot" with the ability to detect damage to itself and alter its walking style in response.

Josh Bongard, an assistant professor of computer science at the University of Vermont in Burlington, and his colleagues created the robot as part of a NASA pilot project working on technology for the next generation of planetary rovers.

While people and animals can easily compensate for injuries, even a small amount of damage can ground NASA machinery entirely.

See the full article.

Friday, November 17, 2006

CMU ML talk: Machine Learning and Human Learning

Speaker: Prof. Tom Mitchell, CMU
http://www.cs.cmu.edu/~tom
Date: November 20
Time: 12:00 noon

For schedules, links to papers et al, please see the web page:
http://www.cs.cmu.edu/~learning/

Abstract:

For the past 30 years, researchers studying machine learning and researchers studying human learning have proceeded pretty much independently. We now know enough about both fields that it is time to re-ask the question: "How can studies of human learning and studies of machine learning inform one another?" This talk will address this question by briefly covering some of the key facts we now understand about both machine learning and human learning, then examining in some detail several specific types of machine learning which may provide surprisingly helpful models for understanding aspects of human learning (e.g., reinforcement learning, cotraining).

Lab meeting 17 Nov, 2006 (Jim): Real-Time Simultaneous Localisation and Mapping with a Single Camera

Real-Time Simultaneous Localisation and Mapping with a Single Camera
Andrew J. Davison, ICCV 2003

PDF, homepage

Abstract:
Ego-motion estimation for an agile single camera moving through general, unknown scenes becomes a much more challenging problem when real-time performance is required rather than under the off-line processing conditions under which most successful structure from motion work
has been achieved. This task of estimating camera motion from measurements of a continuously expanding set of self-mapped visual features is one of a class of problems known as Simultaneous Localisation and Mapping (SLAM) in the robotics community, and we argue that such real-time mapping research, despite rarely being camera-based, is more relevant here than off-line structure from motion methods due to the more fundamental emphasis placed on propagation of uncertainty.
We present a top-down Bayesian framework for single-camera localisation via mapping of a sparse set of natural features using motion modelling and an information-guided active measurement strategy, in particular addressing the difficult issue of real-time feature initialisation via a factored sampling approach. Real-time handling of uncertainty permits robust localisation via the creating and active measurement of a sparse map of landmarks such that regions can be re-visited after periods of neglect and localisation can continue through periods when few features are visible. Results are presented of real-time localisation for a hand-waved camera with very sparse prior scene knowledge and all processing carried out on a desktop PC.

Jaron Lanier forecasts the future

* 16 November 2006
* NewScientist.com news service
* Jaron Lanier

In the next 50 years, computer science needs to achieve a new unification between the inside of the computer and the outside. The inside is still governed by the mid-20th century approach – that is, every program must have defined inputs and outputs in order to function.

The outside, however, encounters the real world and must analyse data statistically. Robots must assess a terrain in order navigate it. Language translation programs must make guesses in order to function. Because the interface to the outside world involved approximation, it is also capable of adjusting itself to improve the quality of approximation. But the inside of a computer must adhere to protocols to function at all, and therefore cannot evolve automatically.

See the full article.

Eric Horvitz forecasts the future

* 8 November 2006
* NewScientist.com news service
* Eric Horvitz

Computation is the fire in our modern-day caves. By 2056, the computational revolution will be recognised as a transformation as significant as the industrial revolution. The evolution and widespread diffusion of computation and its analytical fruits will have major impacts on socioeconomics, science and culture.

Within 50 years, lives will be significantly enhanced by automated reasoning systems that people will perceive as "intelligent". Although many of these systems will be deployed behind the scenes, others will be in the foreground, serving in an elegant, often collaborative manner to help people do their jobs, to learn and teach, to reflect and remember, to plan and decide, and to create. Translation and interpretation systems will catalyse unprecedented understanding and cooperation between people. At death, people will often leave behind rich computational artefacts that include memories, reflections and life histories, accessible for all time.

Robotic scientists will serve as companions in discovery by formulating theories and pursuing their confirmation.

See the full article.

Rodney Brooks forecasts the future

* 18 November 2006
* NewScientist.com news service
* Rodney Brooks

Show a two-year-old child a key, a shoe, a cup, a book or any of hundreds of other objects, and they can reliably name its class - even when they have never before seen something that looks exactly like that particular key, shoe, cup or book. Our computers and robots still cannot do this task with any reliability. We have been working on this problem for a while. Forty years ago the Artificial Intelligence Laboratory at the Massachusetts Institute of Technology appointed an undergraduate to solve it over the summer. He failed, and I failed on the same problem in my 1981 PhD.

In the next 50 years we can solve the generic object recognition problem. We are no longer limited by lack of computer power, but we are limited by a natural risk aversion to a problem on which many people have foundered in the past few decades.

See the full article.

Thursday, November 16, 2006

Lab meeitng 17 Nov., 2006 (Any): World modeling for an autonomous mobile robot using heterogenous sensor information

Title: World modeling for an autonomous mobile robot using heterogenous sensor information

Local Copy: [Here]
Related Link: [Here]

Author: Klaus-Werner Jorg
From: Robotics and Autonomous Systems, 1995

Abstract:
An Autonomous Mobile Robot (AMR) has to show both goal-oriented behavior and reflexive behavior in order to be considered fully autonomous. In a classical, hierarchical control architecture these behaviors are realized by using several abstraction levels while their individual informational needs are satisfied by associated world models. The focus of this paper is to describe an approach which utilizes heterogenous information provided by a laser-radar and a set of sonar sensors in order to achieve reliable and complete world models for both real-time collision avoidance and local path planning. The approach was tested using MOBOT-IV, which serves as a test-platform within the scope of a research project on autonomous mobile robots for indoor applications. Thus, the experimental results presented here are based on real data.

Lab meeitng 17 Nov., 2006 (Stanley): Policies Based on Trajectory Libraries

Author:
Martin Stolle

Abstract:
We present a control approach that uses a library of trajectories to establish a global control law or policy. This is an alternative to methods for finding global policies based on value functions using dynamic programming and also to using plans based on a single desired trajectory. Our method has the advantage of providing reasonable policies much faster than dynamic programming can provide an initial policy. It also has the advantage of providing more robust and global policies than following a single desired trajectory. Trajectory libraries can be created for robots with many more degrees of freedom than what dynamic programming can be applied to as well as for robots with dynamic model discontinuities. Results are shown for the “Labyrinth” marble maze, both in simulation as well as a real world version. The marble maze is a difficult task which requires both fast control as well as planning ahead.

Link:
ICRA 2006
Thesis proposal

Sunday, November 12, 2006

News: Google and Microsoft aim to give you a 3-D world

By Brad Stone
Newsweek

Nov. 20, 2006 issue - The sky over San Francisco is cerulean blue as you begin your descent into the city from 2,000 feet. As you pass over the southern hills, the skyline of the Financial District rises into view. On the descent into downtown, familiar skyscrapers form an urban canyon around you; you can even see the trolley tracks running down the valley formed by Market Street. But then a little pop-up box next to the Bay Bridge explains that an accident has just occurred on the west-ern span, and a thick red line indicates the resulting traffic jam along the highway. A banner ad for Emeryville, Calif., firm ZipRealty hangs incongruously in the air over the Transamerica Pyramid. You are actually staring at your PC screen, not out an airplane window.

Virtual Earth 3D, the online service unveiled last week by Microsoft, is both incomplete (only 15 cities are depicted in 3-D) and imperfect (some of the buildings are shrouded in shadow, and you need a powerful PC running Windows XP or the new Vista to use it). But it is also the start of something potentially big: the 3-D Web. Traditional Web pages give us text, photos and video, unattached to real-world context. Now interactive mapping programs like Google Earth let us zoom around the globe on our PCs and peer down at the topography captured by satellites and aerial photographers. Both Google Earth and Microsoft's Virtual Earth are hugely popular and have been downloaded more than 100 million times each.

See the full article.

[Folks, we can build 4D maps of the real world in a much faster and more reliable way, right? -Bob]

Thursday, November 09, 2006

Lab Meeting 10 Nov talk (Casey): Object Class Recognition Using Multiple Layer Boosting with Heterogeneous Features

Title: Object Class Recognition Using Multiple Layer Boosting with Heterogeneous Features
Authors: Wei Zhang, Bing Yu, Gregory J. Zelinsky, Dimitris Samaras
(CVPR 2005)

Abstract:
We combine local texture features (PCA-SIFT), global features (shape context), and spatial features within a single
multi-layer AdaBoost model of object class recognition.
The first layer selects PCA-SIFT and shape context features and combines the two feature types to form a strong classifier.
Although previous approaches have used either feature type to train an AdaBoost model, our approach is the first to combine these complementary sources of information into a single feature pool and to use Adaboost to select
those features most important for class recognition.
The second layer adds to these local and global descriptions information about the spatial relationships between features.
Through comparisons to the training sample, we first find the most prominent local features in Layer 1, then capture
the spatial relationships between these features in Layer 2.
Rather than discarding this spatial information, we therefore use it to improve the strength of our classifier. We compared our method to [4, 12, 13] and in all cases our approach outperformed these previous methods using a popular
benchmark for object class recognition [4].
ROC equal error rates approached 99%. We also tested our method using a dataset of images that better equates the complexity
between object and non-object images, and again found that our approach outperforms previous methods.

Download: [link]

Robot learns to grasp everyday chores- BRIAN D. LEE

Original link

From left, graduate students Ashutosh Saxena and Morgan Quigley and Assistant Professor Andrew Ng were part of a large effort to develop a robot to see an unfamiliar object and ascertain the best spot to grasp it.

Stanford scientists plan to make a robot capable of performing everyday tasks, such as unloading the dishwasher. By programming the robot with "intelligent" software that enables it to pick up objects it has never seen before, the scientists are one step closer to creating a real life Rosie, the robot maid from The Jetsons cartoon show.

"Within a decade we hope to develop the technology that will make it useful to put a robot in every home and office," said Andrew Ng, an assistant professor of computer science who is leading the wireless Stanford Artificial Intelligence Robot (STAIR) project.

"Imagine you are having a dinner party at home and having your robot come in and tidy up your living room, finding the cups that your guests left behind your couch, picking up and putting away your trash and loading the dishwasher," Ng said.

Cleaning up a living room after a party is just one of four challenges the project has set out to have a robot tackle. The other three include fetching a person or object from an office upon verbal request, showing guests around a dynamic environment and assembling an IKEA bookshelf using multiple tools.

Developing a single robot that can solve all these problems takes a small army of about 30 students and 10 computer science professors—Gary Bradski, Dan Jurafsky, Oussama Khatib, Daphne Koller, Jean-Claude Latombe, Chris Manning, Ng, Nils Nilsson, Kenneth Salisbury and Sebastian Thrun.

From Shakey to Stanley and beyond

Stanford has a history of leading the field of artificial intelligence. In 1966, scientists at the Stanford Research Institute built Shakey, the first robot to combine problem solving, movement and perception. Flakey, a robot that could wander independently, followed. In 2005, Stanford engineers won the Defense Advanced Research Projects Agency (DARPA) Grand Challenge with Stanley, a robot Volkswagen that autonomously drove 132 miles through a desert course.

The ultimate aim for artificial intelligence is to build a robot that can create and execute plans to achieve a goal. "The last serious attempt to do something like this was in 1966 with the Shakey project led by Nils Nilsson," Ng said. "This is a project in Shakey's tradition, done with 2006 technology instead of 1966 AI technology."

To succeed, the scientists will need to unite fragmented research areas of artificial intelligence including speech processing, navigation, manipulation, planning, reasoning, machine learning and vision. "There are these disparate AI technologies and we'll bring them all together in one project," Ng said.

The true problem remains in making a robot independent. Industrial robots can follow precise scripts to the point of balancing a spinning top on a blade, he said, but the problem comes when a robot is requested to perform a new task. "Balancing a spinning top on the edge of a sword is a solved problem, but picking up an unfamiliar cup is an unsolved problem," Ng explained.

His team recently designed an algorithm that allowed STAIR to recognize familiar features in different objects and select the right grasp to pick them up. The robot was trained in a computer-generated environment to pick up five items—a cup, pencil, brick, book and martini glass. The algorithm locates the best place for the robot to grasp an object, such as a cup's handle or a pencil's midpoint. "The robot takes a few pictures, reasons about the 3-D shape of the object, based upon computing the location, and reaches out and grasps the object," Ng said.

In tests with real objects, the robotic arm picked up items similar to those with which it trained, such as cups and books, as well as unfamiliar objects including keys, screwdrivers and rolls of duct tape. To grasp a roll of duct tape, the robot employs an algorithm that evaluates the image against all prior strategies. "The roll of duct tape looks a little like a cup handle and also a little bit like a book," Ng said. The program formulates the best location to clutch based on a combination of all the robot's prior experiences and tells the arm where to go. "It would be a hybrid, or a combination of all the different grasping strategies that it has learned before," Ng said.

The word "robot" originates from a Slavic word meaning "toil," and robots may soon reduce the amount of drudgery in our daily lives. "I think if we can have a robot intelligent enough to do these things, that will free up vast amounts of human time and enable us to go to higher goals," Ng said.

Funding for the project has come from the National Science Foundation, DARPA and industrial technology companies Intel, Honda, Ricoh and Google.

Brian D. Lee is a science writing intern with the Stanford News Service.

[FRC seminar] Learning Robot Control Policies Using the Critique of Teacher

Speaker:
Brenna Argall
Ph.D. Candidate
Robotics Institute

Abstract:
Motion control policies are a necessary component of task execution on mobile robots. Their development, however, is often a tedious and exacting procedure for a human programmer. An alternative is to teach by demonstration; to have the robot extract its policy from the example executions of a teacher. With such an approach, however, most of the learning burden is typically placed with the robot. In this talk we present an algorithm in which the teacher augments the robot's learning with a performance critique, thus shouldering some of the learning burden. The teacher interacts with the system in two phases: first by providing demonstrations for training, and second by offering a critique on learner performance. We present an application of this algorithm in simulation, along with preliminary implementation on a real robot system. Our results show improved performance with teacher critiquing, where performance is measured by both execution success and efficiency.

Speaker Bio:
Brenna is currently a third year Ph.D. candidate in the Robotics Institute at Carnegie Mellon University, affiliated with the CORAL Research Group. Her research interests lie with robot autonomy and heterogeneous team coordination, and how machine learning may be used to build control policies which accomplish these tasks. Prior to joining the Robotics Institute, Brenna investigated functional MRI brain imaging in the Laboratory of Brain and Cognition at the National Institutes of Health. She received her B.S. in Mathematics from Carnegie Mellon in 2002, along with minors in Music and Biology.

Monday, November 06, 2006

NEWS: Robot guide makes appearance at Fukushima hospital

A robot with the ability to recognize speech and show hospital visitors the way to consultation rooms and wards has made its debut at Aizu Central Hospital in Aizuwakamatsu.

When the robot is asked to pinpoint a location, it projects a three-dimensional image showing the route to the destination through a projector on its head, and prints out a map from a built-in printer, which it hands to the visitor.

AIZUWAKAMATSU, Fukushima
November 5, 2006

See the full article

[CMU VASC Seminar]Real Time 3D Surface Imaging and Tracking for Radiation Therapy

VASC Seminar Series

Speaker: Maud Poissonnier, Vision RT

Time: Thursday, 11/9

Title: Real Time 3D Surface Imaging and Tracking for Radiation Therapy

Abstract:

Radiation Therapy involves the precise delivery of high energy X-rays to
tumour tissue in order to treat cancer. The current challenge is to ensure
that the radiation is delivered to the correct target location, thus
reducing the volume of normal tissue irradiated and potentially enabling
the escalation of dose. This may be achieved through the exploitation of a
combination of imaging technologies.

Vision RT has developed a 3D surface imaging technology which can image
the entire 3D surface of a patient quickly and accurately. This relies on
close range stereo photogrammetric techniques using pairs of stereo
cameras. Registration algorithms are employed to match surface data
acquired from the same patient in different positions. High speed tracking
techniques have also been developed to allow tracking of regions at speeds
of approximately 20 fps.

AlignRT(r) is Vision RT's patient setup and surveillance system which is
now in use at a variety of clinics. The system acquires 3D surface data
during simulation or imports reference contours from diagnostic Computed
Tomography (CT). It then images the patient prior to treatment and
computes any 3D movement required to correct the patient's position. The
system is also able to monitor any patient movement during treatment. We
will finish by presenting work-in-progress systems which allow real time
tracking of breathing motion to facilitate 4D CT reconstruction and
respiratory gated radiotherapy.

Bio:

Dr. Maud Poissonnier was educated in France until she decided to
explore Great Britain in 1995. She first graduated at Heriot-Watt
University in Edinburgh with an MSc in Reservoir Evaluation and Management
(Petroleum Engineering). She then joined the Medical Vision Lab (Robotics
Research Group) at the University of Oxford. She obtained her EPSRC-funded
DPhil in 2003 under the supervision of Sir Prof. Mike Brady in the area of
x-ray mammography image processing using physics-based modelling. After a
post-doctoral research on a multimodality, Grid enabled platform for
tele-mammography, she moved very slightly East (to London) and joined
Vision RT Ltd in April 2005 where she is presently Senior Software
Engineer.

Sunday, November 05, 2006

[NEWS]Walking Partner Robot helps old ladies cross the street

Nomura Unison Co., Ltd. has developed a walking-assistance robot that perceives and responds to its environment. The machine, called Walking Partner Robot, was developed with the cooperation of researchers from Tohoku University. It will be unveiled to the general public at the 2006 Suwa Area Industrial Messe on October 19 in Suwa, Nagano prefecture.
The robot is equipped with a system of sensors that detect the presence of obstacles, stairs, etc. while monitoring the motion and behavior of the user. Three sensors monitor the status of the user while detecting and measuring the distance to potential obstacles, and two angle sensors measure the slope of the path in front of the machine. The robot responds to these measurements with voice warnings and by automatically applying brakes when necessary.
Walking Partner Robot is essentially a high-tech walker designed to support users as they walk upright, preventing them from falling over. The user grasps a set of handles while pushing the unmotorized 4-wheeled robot, which measures 110 (H) x 70 (W) x 80 (D) cm and weighs 70 kilograms (154 lbs).
Walking Partner Robot is the second creation from the team responsible for the Partner Ballroom Dance Robot, which includes Tohoku University robotics researchers Kazuhiro Kosuge and Yasuhisa Hirata. The goal was to apply the Partner Ballroom Dance Robot technology, which perceives the intended movement and force of human footsteps, to a robot that can play a role in the realm of daily life. The result is a machine that can perceive its surroundings and provide walking assistance to the elderly and physically disabled.
The developers, who also see potential medical rehabilitation applications, aim to develop indoor and outdoor models of the robot. The company hopes to make the robot commercially available soon at a price of less than 500,000 yen (US$4,200) per unit.

A Braille Writing Tutor to Combat Illiteracy in Developing Communities

Title: A Braille Writing Tutor to Combat Illiteracy in Developing Communities
Speaker: Nidhi Kalra, Tom Lauwers
Date/Time/Location: Tuesday, November 7th, 2006, 11am, NSH 3305
Abstract:
We present the Braille Writing Tutor project, an initiative sponsored by TechBridgeWorld to combat the high rates of illiteracy among the blind in developing communities using an intelligent tutoring system. Developed in collaboration withe Mathru Educational Trust for the Blind in Bangalore, India, the tutor uses a novel input device to capture students' activity on a slate and stylus and uses a range of scaffolding techniques and Artificial Intelligence to teach writing skills to both beginner and advanced students. We conducted our first field study from August to September 2006 at the Mathru School to evaluate its feasibility and impact in a real educational setting. The tutor was met with great enthusiasm by both the teachers and the students and has already demonstrated a concrete impact on the students' writing abilities. Our study also highlights a number of important areas for future research and development which we invite the community to explore and investigate with us.

For more information and videos:
http://www.cs.cmu.edu/~nidhi/brailletutor.html

Speaker Bio:
Nidhi Kalra is a fifth year Ph.D. student at the Robotics Institute. She is keenly interested in applying technology to sustainable development and in understanding related public policy issues. Nidhi hopes to start a career in this field after completing her Ph.D. Her thesis area of research is in multirobot coordination and she is advised by Dr. Anthony Stentz. She has an MS in Robotics from the RI and received her BS in computer science from Cornell University in 2002. Nidhi is a native of India.

Tom Lauwers is a fourth year Ph.D. student at the Robotics Institute. He has a long-standing interest in educational robotics, as both a participant in programs like FIRST and later as a designer of a robotics course and education technology. He is currently studying curriculum development and evaluation and hopes that his study of the educational sciences will help him design better and more useful educational technologies. Tom received a BS in Electrical Engineering and a BS in Public Policy from CMU.

Saturday, November 04, 2006

[The ML Lunch talk ]Boosting Structured Prediction for Imitation Learning

Speaker: Nathan Ratliff, CMU
http://www.cs.cmu.edu/~ndr

Title: Boosting Structured Prediction for Imitation Learning

Venue: NSH 1507

Date: November 06

Time: 12:00 noon

Abstract:

The Maximum Margin Planning (MMP) algorithm solves imitation learning
problems by learning linear mappings from features to cost functions in
a planning domain. The learned policy is the result of minimum-cost
planning using these cost functions. These mappings are chosen so that
example policies (or trajectories) given by a teacher appear to be lower
cost (with a loss-scaled margin) than any other policy for a given
planning domain. We provide a novel approach, MMPBoost, based on the
functional gradient descent view of boosting that extends MMP by
``boosting'' in new features. This approach uses simple binary
classification or regression to improve performance of MMP imitation
learning, and naturally extends to the class of structured maximum
margin prediction problems. Our technique is applied to navigation and
planning problems for outdoor mobile robots and robotic legged
locomotion.

In this talk, I will first provide an overview of the MMP approach to
imitation learning, followed by an introduction to our boosting
technique for learning nonlinear cost functions within this framework. I
will finish with a number of experimental results and a sketch of how
stuctured boosting algorithms of this sort can be derived.

Friday, November 03, 2006

[FRC Seminar] Learning-enhanced Market-based Task Allocation for Disaster Response

Speaker:
E. Gil Jones
Ph.D. Candidate
Robotics Institute

Abstract:
This talk will introduce a learning-enhanced market-based task allocation system for disaster response domains. I model the disaster response domain as a team of robots cooperating to extinguish a series of fires that arise due to a disaster. Each fire is associated with a time-decreasing reward for successful mitigation, with the value of the initial reward corresponding to task importance, and the speed of decay of the reward determining the urgency of the task. Deadlines are also associated with each fire, and penalties are assessed if fires are not extinguished by their deadlines. The team of robots aims to maximize summed reward over all emergency tasks, resulting in the lowest overall damage from the series of fires.

In this talk I will first describe my implementation of a baseline market-based approach to task allocation for disaster response. In the baseline approach the allocation respects fire importance and urgency, but agents do a poor job of anticipating future emergencies and are assessed a high number of penalties. I will then describe two regression-based approaches to learning-enhanced task allocation. The first approach, task-based learning, seeks to improve agents' valuations for individual tasks. The second method, schedule-based learning, tries to quantify the
tradeoff between performing a given task or not performing the task and retaining the flexibility to better perform future tasks. Finally, I will compare the performance of the two learning methods and the baseline approach over a variety of parameterizations of the disaster response domain.

Speaker Bio:
Gil is a second year Ph.D. student at the Robotics Institute, and is co-advised by Bernardine Dias and Tony Stentz. His primary interest is market-based multi-robot coordination. He received his BA in Computer Science from Swarthmore College in 2001, and spent two years as a software engineer at Bluefin Robotics - manufacturer of autonomous underwater vehicles - in Cambridge, Mass.

Wednesday, November 01, 2006

Lab meeitng 1 Nov., 2006 (Vincent): Fitting a Single Active Appearance Model Simultaneously to Multiple Images

In this talk, I will present the following 2 papers.

Title :
Fitting a Single Active Appearance Model Simultaneously to Multiple Images

Author :
Changbo Hu, Jing Xiao, Iain Matthews, Simon Baker, Jeff Cohn and Takeo Kanade
The Robotics Institute,CMU

Origin :
BMVC 2004

Abstract :
Active Appearance Models (AAMs) are a well studied 2D deformable model. One recently proposed extension of AAMs to multiple images is the Coupled-View AAM. Coupled-View AAMs model the 2D shape and appearance of a face in two or more views simultaneously. The major limitation of Coupled-View AAMs, however, is that they are specific to a particular set of cameras, both in geometry and the photometric responses. In this paper, we describe how a single AAM can be fit to multiple images, captured simultaneously by cameras with arbitrary geometry and response functions. Our algorithm retains the major benefits of Coupled-View AAMs:~the integration of information from multiple images into a single model, and improved fitting robustness.

Title :
Real-Time Combined 2D+3D Active Appearance Models

Author :
Jing Xiao, Simon Baker, Iain Matthews, and Takeo Kanade
The Robotics Institute,CMU

Origin :
CVPR 2004

Abstract :
Active Appearance Models (AAMs) are generative models commonly used to model faces. Another closely related type of face models are 3D Morphable Models (3DMMs). Although AAMs are 2D, they can still be used to model 3D phenomena such as faces moving across pose. We first study the representational power of AAMs and show that they can model anything a 3DMM can, but possibly require more shape parameters. We quantify the number of additional parameters required and show that 2D AAMs can generate model instances that are not possible with the equivalent 3DMM. We proceed to describe how a non-rigid structure-from-motion algorithm can be used to construct the corresponding 3D shape modes of a 2D AAM. We then show how the 3D modes can be used to constrain the AAM so that it can only generate model instances that can also be generated with the 3D modes. Finally, we propose a real-time algorithm for fitting the AAM while enforcing the constraints, creating what we call a "Combined 2D+3D AAM."

[NEWS]iRobot Unveils New Technology for Simultaneous Control of Multiple Robots

iRobot Corp. today released the first public photo of a new project in development, code named Sentinel. This innovative new networked technology will allow a single operator to simultaneously control and coordinate multiple semi-autonomous robots via a touch-screen computer. Funded by the U.S. Army's Small Business Innovation and Research (SBIR) program, the Sentinel technology includes intelligent navigation capabilities that enable the robots to reach a preset destination independently, overcoming obstacles and other challenges along the way without intervention from an operator.Sentinel’s capability will allow warfighters and first responders to use teams of iRobot® PackBot® robots to conduct surveillance and mapping, therefore rendering dangerous areas safe without ever setting foot in a hostile environment.

[Full]

Lab meeitng 1 Nov., 2006 (Chihao): Microphone Array Speaker Localizers Using Spatial-Temporal Information

author:
Sharon Gannot and Tsvi Gregory Dvorkind

from:
EURASIP Journal on Applied Signal Processing
Volume 2006

abstract:

A dual-step approach for speaker localization based on a microphone array is addressed in this paper. In the first stage, which is not the main concern of this paper, the time difference between arrivals of the speech signal at each pair of microphones is estimated. These readings are combined in the second stage to obtain the source location. In this paper, we focus on the second stage of the localization task. In this contribution, we propose to exploit the speaker’s smooth trajectory for improving the current position estimate. Three localization schemes, which use the temporal information, are presented. The first is a recursive form of the Gauss method. The other two are extensions of the Kalman filter to the nonlinear problem at hand, namely, the extended Kalman filter and the unscented Kalman filter. These methods are compared with other algorithms, which do not make use of the temporal information. An extensive experimental study demonstrates the advantage of using the spatial-temporalmethods. To gain some insight on the obtainable performance of the localization algorithm, an approximate analytical evaluation, verified by an experimental study, is conducted. This study shows that in common TDOA-based localization scenarios—where the microphone array has small interelement spread relative to the source position—the elevation and azimuth angles can be accurately estimated, whereas the Cartesian coordinates as well as the range are poorly estimated.

Link

Tuesday, October 31, 2006

Lab meeting 3 Nov. 2006(Atwood):Object Class Recognition by Unsupervised Scale-Invariant Learning

Title: Object Class Recognition by Unsupervised Scale-Invariant Learning
Author: R. Fergus1 P. Perona2 A. Zisserman1

Abstract:
We present a method to learn and recognize object class
models from unlabeled and unsegmented cluttered scenes
in a scale invariant manner. Objects are modeled as flexible
constellations of parts. A probabilistic representation is
used for all aspects of the object: shape, appearance, occlusion
and relative scale. An entropy-based feature detector
is used to select regions and their scale within the image. In
learning the parameters of the scale-invariant object model
are estimated. This is done using expectation-maximization
in a maximum-likelihood setting. In recognition, this model
is used in a Bayesian manner to classify images. The flexible
nature of the model is demonstrated by excellent results
over a range of datasets including geometrically constrained
classes (e.g. faces, cars) and flexible objects (such
as animals).

Full text:
link

Monday, October 30, 2006

Title: Learning Dynamic Maps of Temporal Gene Regulation

Title: Learning Dynamic Maps of Temporal Gene Regulation

Speaker: Jason Ernst, CMU [Link]

Venue: NSH 1507

Date: October 30

Time: 12:00 noon

For schedules, links to papers et al, please see the web page: Link

Abstract:
Time series microarray gene expression experiments have become a widely used experimental technique to study the dynamic biological responses of organisms to a variety of stimuli. The data from these experiments are often clustered to reveal significant temporal expression patterns. These observed temporal expression patterns are largely a result of a dynamic network of protein-DNA interactions that allows the specific regulation of genes needed for the response. We have developed a novel computational method that uses an Input-Output Hidden Markov Model to model these regulatory networks while taking into account their dynamic nature. Our method works by identifying bifurcation points, places in the time series where the expression of a subset of genes diverges from the rest of the genes. These points are annotated with the transcription factors regulating these transitions resulting in a unified dynamic map. Applying our method to study yeast response to stress we derive dynamic maps that are able to recover many of the known aspects of these responses. Additionally the method has made new predictions that have been experimentally validated.

Sunday, October 29, 2006

Batch Mode Active Learning

When: Friday October 27, at 10amWhere: Intel Research (4th floor CIC)
Speaker: Rong Jin (MSU)Title: Batch Mode Active Learning

Abstract:
The goal of active learning is to select the most informativeexamples for manual labeling. Most of the previous studies in activelearning have focused on selecting a single unlabeled example in eachiteration. This is inefficient since the classification model has to beretrained for every labeled example that is solicited. In this paper, wepresent a framework for "batch mode active learning" that applies theFisher information matrix to select a number of informative examplessimultaneously. The key computational challenge is how to efficientlyidentify the subset of unlabeled examples that can result in thelargest reduction in the classification uncertainty. In this talk, Iwill discuss two different computational approaches: one is based onthe approximated semi-definitive programming technique and the other isbased on the property of submodular functions. Empirical studies showthe promising results of the proposed approaches for batch mode activelearning in comparison to the state-of-the-art active learning methods.

Bio:
Dr. Rong Jin is an assistant Prof. of the Computer and Science
Engineering Dept. of Michigan State University since 2003. He is working
in the areas of statistical machine learning and its application to
information retrieval. In the past, Dr. Jin has worked on a variety
of machine learning algorithms, and has presented efficient and
robust algorithms for conditional exponential models, support vector
machine, and boosting. In addition, he has extensive experience
with the application of machine learning algorithms to information
retrieval, including retrieval models, collaborative filtering, cross
lingual information retrieval, document clustering, and video/image
retrieval. In the past, he has published over sixty conference and
journal articles on the related topics. Dr. Jin holds a B.A. in
Engineering from Tianjin University, an M.S. in Physics from Beijing
University, and an M.S. and Ph.D. in the area of language technologies
from Carnegie Mellon University.

Thursday, October 26, 2006

Lab meeitng 27 Oct., 2006 (Bright): Better Motion Prediction for People-tracking

Authors: Allison Bruce and Geoffrey Gordon

From: ICRA 2004

Abstract:An important building block for intelligent mobile
robots is the ability to track people moving around in the environment.
Algorithms for person-tracking often incorporate motion
models, which can improve tracking accuracy by predicting how
people will move. More accurate motion models produce better
tracking because they allow us to average together multiple
predictions of the person’s location rather than depending
entirely on the most recent observation. Many implemented
systems, however, use simple conservative motion models such
as Brownian motion (in which the person’s direction of motion
is independent on each time step). We present an improved
motion model based on the intuition that people tend to follow
efficient trajectories through their environments rather than
random paths. Our motion model learns common destinations
within the environment by clustering training examples of actual
trajectories, then uses a path planner to predict how a person
would move along routes from his or her present location
to these destinations. We have integrated this motion model
into a particle-filter-based person-tracker, and we demonstrate
experimentally that our new motion model performs significantly
better than simpler models, especially in situations in which there
are extended periods of occlusion during tracking.

Link

[MIT CSAIL] Navigating and Reconstructing the World's Photographs

Title : Navigating and Reconstructing the World's Photographs

Speaker : Steven Seitz , University of Washington

Abstract :

There are billions of photographs on the Internet. Virtually all of the world's significant sites have been photographed under many different conditions, both from the ground and from the air. For example, a Google image search for "Notre Dame" returns half a million images, showing the cathedral from almost every conceivable viewing position and angle, different times of day and night, and changes in season, weather, and decade. In many ways, this is the dream data set for computer vision and graphics research.

Motivated by the availability of such rich data, we are exploring matching, reconstruction, and visualization algorithms that can work with very large, unorganized, and uncalibrated images sets, such as those found on the Internet. In this talk, I'll describe "Photo Tourism," (now being commercialized by Microsoft as "Photosynth"), an approach that creates immersive 3D experiences of scenes by reconstructing photographs on the Internet. I'll also describe work on multi-view stereo that reconstructs accurate 3D models from large collections of input views.

This is joint work with Noah Snavely, Rick Szeliski, Michael Goesele, Brian Curless, and Hugues Hoppe.

Bio :

Steven Seitz is Short-Dooley Associate Professor in the Department of Computer Science and Engineering at the University of Washington. He received his B.A. in computer science and mathematics at the University of California, Berkeley in 1991 and his Ph.D. in computer sciences at the University of Wisconsin, Madison in 1997. Following his doctoral work, he spent one year visiting the Vision Technology Group at Microsoft Research, and subsequently two years as an Assistant Professor in the Robotics Institute at Carnegie Mellon University. He joined the faculty at the University of Washington in July 2000. He was twice awarded the David Marr Prize for the best paper at the International Conference of Computer Vision, and has received an NSF Career Award, an ONR Young Investigator Award, and an Alfred P. Sloan Fellowship. Professor Seitz is interested in problems in computer vision and computer graphics. His current research focuses on capturing the structure, appearance, and behavior of the real world from digital imagery.

[CMU Intelligence Seminar] Robust Autonomous Color Learning on a Mobile Robot

Title : Robust Autonomous Color Learning on a Mobile Robot

Abstract :

The scientific community is slowly but surely working towards the creation of fully autonomous mobile robots capable of interacting with the proverbial real world. To operate in the real world, autonomous robots rely on their sensory information, but the ability to accurately sense the complex world is still missing. Visual input, in the form of color images from a camera, should be an excellent and rich source of such information, considering the significant amount of progress made in machine vision. But color, and images in general, have been used sparingly on mobile robots, where people have mostly focused their attention on other sensors such as tactile sensors, sonar and laser.

This talk presents the challenges raised and solutions introduced in our efforts to create a robust, color-based visual system for the Sony Aibo robot. We enable the robot to learn its color map autonomously and demonstrate a degree of illumination invariance under changing lighting conditions. Our contributions are fully implemented and operate in real time within the limited processing resources available onboard the robot. The system has been deployed in periodic robot soccer competitions, enabling teams of four Aibo robots to play soccer as a part of the international RoboCup initiative.

Bio :

Dr. Peter Stone is an Alfred P. Sloan Research Fellow and Assistant Professor in the Department of Computer Sciences at the University of Texas at Austin. He received his Ph.D in Computer Science in 1998 from Carnegie Mellon University. From 1999 to 2002 he was a Senior Technical Staff Member in the Artificial Intelligence Principles Research Department at AT&T Labs - Research. Peter's research interests include machine learning, multiagent systems, robotics, and e-commerce. In 2003, he won a CAREER award from the National Science Foundation for his research on learning agents in dynamic, collaborative, and adversarial multiagent environments. In 2004, he was named an ONR Young Investigator for his research on machine learning on physical robots. Most recently, he was awarded the prestigious IJCAI 2007 Computers and Thought award.

[FRC seminar] Dynamic Tire-Terrain Models for Obstacle Detection

Speaker:
Dean Anderson
Ph.D. Candidate
Robotics Institute

Abstract:
What is an obstacle? In mobile robots, this is a question implicitly addressed by a perception system, but rarely directly studied itself.

As robots achieve higher speeds and venture into rougher terrain, dynamic effects become significant and cost metrics based on quasi-static analysis and heuristics perform sub-optimally.

In this talk, we present a calibrated, fully-dynamic deformable tire model for terrain evaluation. The tire model is based on penetrating volumes and includes both rolling and slipping friction forces. We will also discuss an experimental platform used to calibrate the model and insights gained in studying the effects of vehicle speed, obstacle height and slope on the "lethality" of an obstacle. Lastly, we propose a metric of terrain traversability based on our force model, and compare it to previous perception algorithms.

Speaker Bio:
Dean Anderson is a fourth-year Ph.D. student working with Alonzo Kelly. His research interests include sensors and perception algorithms for outdoor mobile robots, as well as dynamic vehicle modeling for perception and planning purposes.

[CMU Intelligence Seminar]] Improving Systems Management Policies Using Hybrid Reinforcement Learning

Other information of Intelligence Seminar [link]

Topic: Improving Systems Management Policies Using Hybrid Reinforcement Learning

Speaker: Gerry Tesauro (IBM Watson Research)

Abstract:
Reinforcement Learning (RL) provides a promising new approach to systems
performance management that differs radically from standard
queuing-theoretic approaches making use of explicit system performance
models. In principle, RL can automatically learn high-quality management
policies without explicit performance models or traffic models, and with
little or no built-in system specific knowledge. Previously we showed
that online RL can learn to make high-quality server allocation
decisions in a multi-application prototype Data Center scenario. The
present work shows how to combine the strengths of both RL and queuing
models in a hybrid approach, in which RL trains offline on data
collected while a queuing model policy controls the system. By training
offline we avoid suffering potentially poor performance in live online
training. Our latest results show that, in both open-loop and
closed-loop traffic, hybrid RL training can achieve significant
performance improvements over a variety of initial model-based policies.
We also give several interesting insights as to how RL, as expected, can
deal effectively with both transients and switching delays, which lie
outside the scope of traditional steady-state queuing theory.

Speaker Bio:
Gerry Tesauro received a PhD in theoretical physics from Princeton
University in 1986, and owes his subsequent conversion to machine
learning research in no small part to the first Connectionist Models
Summer School, held at Carnegie Mellon in 1986. Since then he has worked
on a variety of ML applications, including computer virus recognition,
intelligent e-commerce agents, and most notoriously, TD-Gammon, a
self-teaching program that learned to play backgammon at human world
championship level. He has also been heavily involved for many years in
the annual NIPS conference, and was NIPS Program Chair in 1993 and
General Chair in 1994. He is currently interested in applying the latest
and greatest ML approaches to a huge emerging application domain of
self-managing computing systems, where he foresees great opportunities
for improvements over current state-of-the-art approaches.

Tuesday, October 24, 2006

Inference in Large-Scale Graphical Models and its application to SFM, SAM, and SLAM

Inference in Large-Scale Graphical Models and its application to SFM, SAM, and SLAM

Frank Dellaert, Georgia Tech.

Intelligence Seminar at School of Computer Science at Carnegie Mellon University

Abstract:
Simultaneous Localization and Mapping (SLAM), Smoothing and Mapping (SAM), and Structure from Motion (SFM) are important and closely related problems in robotics and vision. Not surprisingly, there is a large literature describing solutions to each problem, and more and more connections are established between the two fields. At the same time, robotics and vision researchers alike are becoming increasingly familiar with the power of graphical models as a language in which to represent inference problems. In this talk I will show how SFM, SAM, and SLAM can be posed in terms of this graphical model language, and how inference in them can be explained in a purely graphical manner via the concept of variable elimination. I will then present a new way of looking at inference that is equivalent to the junction tree algorithm yet is — in my view — much more insightful. I will also show that, when applied to linear(ized) Gaussian problems, the algorithm yields the familiar QR and Cholesky factorization algorithms, and that this connection with linear algebra leads to strategies for very fast inference in arbitrary graphs. I will conclude by showing some published and preliminary work that exploits this connection to the fullest.

Links:
[Author info]

Lab meeitng 27 Oct., 2006 (Eric): Rapid Shape Acquisition Using Color Structured Light and Multi-pass Dynamic Programming

Authors:Li Zhang, Brian Curless, and Steven M. Seitz

From:In Proceedings of the 1st International Symposium on 3D Data Processing, Visualization, and Transmission (3DPVT) 2002

Abstract:This paper presents a color structured light technique forrecovering object shape from one or more images. The techniqueworks by projecting a pattern of stripes of alternatingcolors and matching the projected color transitions with observededges in the image. The correspondence problem issolved using a novel, multi-pass dynamic programming algorithmthat eliminates global smoothness assumptions andstrict ordering constraints present in previous formulations.The resulting approach is suitable for generating both highspeedscans of moving objects when projecting a single stripepattern and high-resolution scans of static scenes using ashort sequence of time-shifted stripe patterns. In the lattercase, spacetime analysis is used at each sensor pixel to obtaininter-frame depth localization. Results are demonstratedfor a variety of complex scenes.

[Link]

[My talk] The Identity Management Kalman Filter (IMKF)

Title: The Identity Management Kalman Filter (IMKF)

Authors: Brad Schumitsch, Sebastian Thrun, Leonidas Guibas, Kunle Olukotun

Robotics: Science and Systems II (RSS 2006)
August 16-19, 2006
University of Pennsylvania
Philadelphia, Pennsylvania

Abstract: Tracking posteriors estimates for problems with data association uncertainty is one of the big open problems in the literature on filtering and tracking. This paper presents a new filter for online tracking of many individual objects with data association ambiguities. It tightly integrates the continuous aspects of the problem -- locating the objects -- with the discrete aspects -- the data association ambiguity. The key innovation is a probabilistic information matrix that efficiently does identity management, that is, it links entities with internal tracks of the filter, enabling it to maintain a full posterior over the system amid data association uncertainties. The filter scales quadratically in complexity, just like a conventional Kalman filter. We derive the algorithm formally and present large-scale results.

PDF:
http://www.roboticsproceedings.org/rss02/p29.pdf

Friday, October 20, 2006

News: Robot swarm works together to shift heavy objects

* 18:47 17 October 2006
* NewScientist.com news service
* Tom Simonite

A "swarm" of simple-minded robots that teams up to move an object too heavy for them to manage individually has been demonstrated by robotics researchers.

The robots cannot communicate and must act only on what they can see around them. They follow simple rules to fulfil their task - mimicking the way insects work together in a swarm.

The robots were developed by Marco Dorigo at the Free University of Brussels, Belgium, along with colleagues at the Institute of Cognitive Science and Technology in Italy and the Autonomous Systems Laboratory and Dalle Molle Institute for the Study of Artificial Intelligence, both in Switzerland.

See the full article & videos.

http://www.swarm-bots.org/

Lab Meeting 20 Oct., 2006 (Jim) : Constrained Initialisation for Bearing-Only SLAM

Constrained Initialisation for Bearing-Only SLAM
Tim Bailey, ICRA2003, PDF

This paper investigates the feature initialisation problem for bearing-only SLAM. Bearing-only SLAM is an attractive capability due to its relationship with cheap vision sensing, but initialising landmarks is difficult. First, the landmark location is unconstrained by a single measurement, and second, the location estimate due to several measurements may be ill-conditioned. This paper presents a solution to the the feature initialisation problem via the method of "constrained initialisation", where measurements are stored and initialisation is deferred until sufficient constraints exist for a well-conditioned solution. A primary contribution of this paper is a measure of "well-conditioned" for initialisation within the traditional extended Kalman Filter (EKF) framework.

Thursday, October 19, 2006

IEEE news: TECHNOLOGIES THAT MAKE YOU SMILE: ADDING HUMOR TO TEXT-BASED APPLICATIONS

Humor is an essential element of interpersonal communication, but surprisingly, research often neglects the topic. But according to a new article in "IEEE Intelligent Systems," computational approaches can be successfully applied to the recognition and use of verbally expressed humor. Read more (PDF).

Should we add humor to our robots? -Bob

IEEE news: Robotic surgery

Surgical robots offer a tantalizing possibility. They could allow military doctors stationed safely distant from the front line, for example, to perform operations without once putting their hands on patients. For that vision to become reality, however, surgical robots need plenty of improvement. One challenge is designing systems that can work under conditions very different from those of pristine operating rooms.

See "Doc at a Distance," by Jacob Rosen and Blake Hannaford: the link

Wednesday, October 18, 2006

[CMU VASC seminar series] High Resolution Acquisition, Tracking and Transfer of Dynamic 3D Facial

The advent of new technologies that allow the capture of massive amounts of high resolution, high frame rate face data, leads us to propose data-driven face models that describe detailed appearance of static faces as well as to track subtle geometry changes during expressions. However, since the dense data in these 3D scans are not registered in object space, inter-frame correspondences can not be established, which makes the tracking of facial features, estimation of facial expression dynamics and other analysis difficult.

In order to use such data for the temporal study of the subtle dynamics in expressions, an efficient non-rigid 3D motion tracking algorithm is needed to establish inter-frame correspondences. In this talk, I will present two frameworks for high resolution, non-rigid dense 3D point tracking. The first framework is a hierarchical scheme using a deformable generic face model. To begin with,a generic face mesh is first deformed to fit the data at a coarse level. Then in order to capture the highly local deformations, we use a variational algorithm for non-rigid shape registration based on the integration of an implicit shape representation and the Free-Form Deformations (FFD). The second framework, a fully automatic tracking method, is presented using harmonic maps with interior feature correspondence constraints. The novelty of this work is the development of an algorithmic framework for 3D tracking that unifies tracking of intensity and geometric features, using harmonic maps with added feature correspondence constraints. Due to the strong implicit and explicit smoothness constraints imposed by both algorithms and the high-resolution data, the resulting registration/deformation field is smooth and continuous. Both our methods are validated through a series of experiments demonstrating its accuracy and efficiency.

Furthermore, the availability of high quality dynamic expression data opens a number of research directions in face modeling. In this talk, several graphics applications will be demonstrated to use the motion data to synthesize new expressions as expression transfer from a source face to a target face.

Bio:

Yang Wang received his B.S. degree and M.Sc. degree in Computer Science from Tsinghua University in 1998 and 2000 respectively. He is a Ph.D. student in the Computer Science Department at the State University of New York at Stony Brook, where he has been working with Prof. Dimitris Samaras since 2000. He specializes in illumination modeling and estimation, 3D non-rigid motion tracking and facial expression analysis and synthesis.He is a member of ACM and IEEE.

Monday, October 16, 2006

Lab Meeting 20 Oct.,2006 (Nelson) : Metric-Based Iterative Closest Point Scan Matching

Link

Javier Minguez, Luis Montesano, and Florent Lamiraux

Abstract—This paper addresses the scan matching problem
for mobile robot displacement estimation. The contribution is
a new metric distance and all the tools necessary to be used
within the iterative closest point framework. The metric
distance is defined in the configuration space of the sensor,
and takes into account both translation and rotation error of
the sensor. The new scan matching technique ameliorates
previous methods in terms of robustness, precision,
convergence, and computational load. Furthermore, it has
been extensively tested tovalidate and compare this
technique with existing methods.

Robotics Institute Thesis Proposal: Planning with Uncertainty in Position Using High-Resolution Maps

Author:
Juan Pablo Gonzalez
Robotics Institute
Carnegie Mellon University

Abstract:
Navigating autonomously is one of the most important problems facing outdoor mobile robots. This task can be extremely difficult if no prior information is available, and would be trivial if perfect prior information existed. In practice prior maps are usually available, but their quality and resolution varies significantly.

When accurate, high-resolution prior maps are available and the position of the robot is precisely known, many existing approaches can reliably perform the navigation task for an autonomous robot. However, if the position of the robot is not precisely known, most existing approaches would fail or would have to discard the prior map and perform the much harder task of navigating without prior information.

Most outdoor robotic platforms have two ways of determining their position: a dead-reckoning system and Global Position Systems (GPS). The dead reckoning system provides a locally accurate and locally consistent estimate that drifts slowly, and the GPS provides globally accurate estimate that does not drift, but is not necessarily locally consistent. A Kalman filter usually combines these two estimates to provide an estimate that has the best of both position estimates.

While for many scenarios this combination suffices, there are many others in which GPS is not available, or its reliability is compromised by different types of interference such as mountains, buildings, foliage or jamming. In these cases, the only position estimate available is that of the dead-reckoning system which drifts with time and does not provide a position estimate accurate enough for most navigation approaches.

This proposal addresses the problem of planning with uncertainty in position using high-resolution maps. The objective is to be able to reliably navigate distances of up to one kilometer without GPS through the use of accurate, high resolution prior maps and a good dead-reckoning system. Different approaches to the problem are analyzed, depending on the types of landmarks available, the quality of the map and the quality of the perception system.

Further Details:
A copy of the thesis proposal document can be found at http://www.ri.cmu.edu/pubs/pub_5571.html.

FRC Seminar: Sliding Autonomy for Complex Coordinated Multi-Robot Tasks: Analysis & Experiments

Speaker:
Frederik Heger
Ph.D. Candidate, Robotics Institute

Abstract:
Autonomous systems are efficient but often unreliable. In domains where reliability is paramount, efficiency is sacrificed by putting an operator in control via teleoperation. We are investigating a mode of shared control called "Sliding Autonomy" that combines the efficiency of autonomy and the reliability of human control in the performance of complex tasks, such as the assembly of large structures by a team of robots. Here we introduce an approach based on Markov models that captures interdependencies between the team members and predicts system performance. We report results from a study in which three robots work cooperatively with an operator to assemble a structure. The scenario requires high precision and has a large number of failure modes. Our results support both our expectations and modeling and show that our combined robot-human team is able to perform the assembly at a level of efficiency approaching that of fully autonomous operation while increasing overall reliability to near-teleoperation levels. This increase in performance is achieved while simultaneously reducing mental operator workload.

Speaker Bio:
Frederik Heger is a third-year Ph.D. student working with Sanjiv Singh. His research interests are in enabling robots to perform complex tasks efficiently and reliably using "Sliding utonomy," and in motion planning for teams of robots working together on complex, coordinated tasks in
constrained environments.

Saturday, October 14, 2006

Clever cars shine at intelligent transport conference (A scanner-based car tracking system)

full article

A prototype system developed by German company Ibeo enables a car to automatically follow the vehicle ahead. At the press of a button an infrared laser scanner in the car's bumpermeasures the distance to the next vehicle and a computer maintains a safe distance, stopping and starting if it becomes stuck in traffic.

The scanner can track stationary and moving objects from up to 200 metres away at speeds of up to 180 kilometres (112 miles) per hour. "It gives a very precise image of what's going on," Max Mandt-Merck of Ibeo told New Scientist.

"Our software can distinguish cars and pedestrians from the distinctive shapes the scanner detects." A video shows the information collected by the scanner (2.1MB, mov format)

Airbag activation

Mandt-Merck says the scanner can also be used to warn a driver when they stray out of lane or try to overtake too close to another vehicle. It could even activate airbags 0.3 seconds before an impact, he says.

Other systems at the show aim to prevent accidents altogether, by alerting drivers when they become distracted. A video shows one that sounds an audible alarm and vibrates the driver's seat when their head turns away from the road ahead (2.75MB WMV format). "There's an infrared camera just behind the steering wheel," explains Kato Kazuya, from Japanese automotive company Aisin. "It detects the face turning by tracking its bilateral symmetry."

A video shows another system, developed by Japanese company DENSO Corporation, that uses an infrared camera to determine whether a driver is becoming drowsy (1.91MB WMV format). "It recognises the shape of your eyes and tracks the height of that shape to watch if they close," explains Takuhiro Oomi. If a driver shuts their eyes for more than a few seconds their seat vibrates and a cold draught hits their neck.

Gaze following

The same camera system could offer other functions, Oomi says. "It can also allow the headlight beams to follow your gaze, or recognise the face of a driver and adjust the seat to their saved preferences," he says.

In the car park outside the conference centre Toyota demonstrated an intelligent parking system. A video shows the system prompting a driver to identify their chosen parking spot, which is identified using ultrasonic sensors (9.8MB, WMV format).

Once the space has been selected, the wheel turns automatically and the driver needs only to limit the car's speed using the brake pedal. When reversing into a parking bay, a camera at rear of the car is used to recognise white lines on the tarmac.

The system needs 7 metres of space for parallel parking, but can fit into a regular parking bay with just 30 centimetres clearance on either side.

"Future developments will probably see a system that lets you get out and leave the car to park itself," says a Toyota spokesman. The intelligent parking system has been available on some Toyota models in Japan since November 2005 and will be available in Europe and the US from January 2007.

Predestination: Inferring Destinations from Partial Trajectories

John Krumm [Microsoft Research (Redmond, WA USA) ] and Eric Horvitz

Eighth International Conference on Ubiquitous Computing (UbiComp 2006), September 2006.

Abstract. We describe a method called Predestination that uses a history of a driver's destinations, along with data about driving behaviors, to predict where a driver is going as a trip progresses. Driving behaviors include types of destinations, driving efficiency, and trip times. Beyond considering previously visited destinations, Predestination leverages an open-world modeling methodology that considers the likelihood of users visiting previously unobserved locations based on trends in the data and on the background properties of locations. This allows our algorithm to smoothly transition between "out of the box" with no training data to more fully trained with increasing numbers of observations. Multiple components of the analysis are fused via Bayesian inference to pro-duce a probabilistic map of destinations. Our algorithm was trained and tested on hold-out data drawn from a database of GPS driving data gathered from 169 different subjects who drove 7,335 different trips.

[Full]
[Homepage]