Robot Perception and Learning: August 2009

Sunday, August 30, 2009

Lab Meeting August 31, 2009 (swem): Optical moving target detection with 3-D matched filtering

Title: Optical moving target detection with 3-D matched filtering

Author: Reed, Irving S.; Gagliardi, Robert M.; Stotts, Larry B.

IEEE Transactions on Aerospace and Electronic Systems (ISSN 0018-9251), vol. 24, July 1988, p. 327-336.

Abstract:

Three-dimensional (3-D) matched filtering has been suggested as a powerful processing technique for detecting weak, moving optical targets immersed in a background noise field. The procedure requires the processing of entire sequences of frames of optical scenes containing the moving targets. The 3-D processor must be properly matched to the target signature and its velocity vector, but will simultaneously detect all targets to which it is matched. The results of a study to evaluate the 3-D processor are presented. Simulation results are reported which show the ability of the processor to detect targets well below the background level. These results demonstrate the capability and robustness of the processor, and show that the algorithms, although somewhat complicated, can be implemented readily. Some effects on the number of frames processed, target flight scenarios, and velocity and signature mismatch are also presented. The ability to detect multiple targets is demonstrated.

link

Saturday, August 29, 2009

Lab Meeting August 31, 2009 (Jim): Maximum Entropy Inverse Reinforcement Learning

I will try to present this paper instead of the previous one.

Title: Maximum Entropy Inverse Reinforcement Learning
B. D. Ziebart, A. Maas, J. A. Bagnell, and A. K. Dey.
AAAI Conference on Artificial Intelligence (AAAI 2008)

Abstract:
In this work, we develop a probabilistic approach based on the principle of maximum entropy. Our approach providesa well-defined, globally normalized distribution over decision sequences, while providing the same performance guarantees as existing methods.

link

Thursday, August 27, 2009

Lab Meeting August 31, 2009 (Jim): Acquisition of probabilistic behavior decision model based on the interactive teaching method

T. Inamura, M. Inaba, and H. Inoue
Acquisition of probabilistic behavior decision model based on the interactive teaching method
In Proceedings of the Ninth International Conference on Advanced Robotics(ICAR'99), 1999

Abstract:
In this paper, we propose a novel method for mobile robots to acquire new autonomous behaviors gradually based on interaction between human and robots. In this method, behavior decision models are constructed using statistical process for experiences of interaction and teaching, and the robot expresses sureness of its own decision using stochastic reasoning.
The robot not only decides behavior using the sureness, but also makes suggestions and questions for the user using the sureness.

Link

NTU talk: The Confluence of Sparse Representation and Computer Vision

Title: The Confluence of Sparse Representation and Computer Vision
Speaker: Professor Yi Ma, ECE Department, UIUC and Microsoft Research Asia
Time: 2:20pm, Aug 28 (Fri), 2009
Place: Room 101, CSIE building

Abstract:

In the past few years, sparse representation and compressive sensing have arisen as a very powerful and popular framework for signal and image processing. It has armed people with new mathematical principles and computational tools that can effectively and efficiently harness sparse, low-dimensional structures of high-dimensional data such as images and videos. In this talk, we contend that the same principles and tools are equally important for analyzing the meaning and semantics of images and help solve many outstanding problems in computer vision.

As an example, we will focus on the recent success of sparse representation in human face recognition. On one hand, tools from sparse representation such as L1-minimization have seen great empirical success in enhancing the robustness of face recognition with occlusion, illumination change, and registration error, leading to striking recognition performance far exceeding human expectation or capability. On the other hand, the peculiar structures of face images have led to new mathematical discovery of remarkable properties of L1 minimization that far exceed the existing sparse representation theory.

We will also illustrate with many other examples in computer vision the importance of sparsity as a guiding principle for extracting and harnessing the structures of high-dimensional visual data. In return, we will see that overwhelming empirical evidences from those examples suggest that an even richer set of new mathematical results can be developed if we systematically extend the theory of sparse representation to clustering or classification of high-dimensional visual data. The confluence of sparse representation and computer vision is leading us to a brand new mathematical foundation for high-dimensional pattern analysis and recognition.

This is joint work with my former PhD students John Wright, Allen Yang, and Shankar Rao.

Short Biography:

Yi Ma is an associate professor at the Electrical & Computer Engineering Department of the University of Illinois at Urbana-Champaign. He is currently on leave as research manager of the Visual Computing group at Microsoft Research Asia in Beijing. His research interests include computer vision, image processing, and systems theory. Yi Ma received two Bachelors’ degree in Automation and Applied Mathematics from Tsinghua University (Beijing, China) in 1995, a Master of Science degree in EECS in 1997, a Master of Arts degree in Mathematics in 2000, and a PhD degree in EECS in 2000, all from the University of California at Berkeley. Yi Ma received the David Marr Best Paper Prize at the International Conference on Computer Vision 1999 and the Longuet-Higgins Best Paper Prize at the European Conference on Computer Vision 2004. He also received the CAREER Award from the National Science Foundation in 2004 and the Young Investigator Award from the Office of Naval Research in 2005. He is an associate editor of IEEE Transactions on Pattern Analysis and Machine Intelligence. He is a senior member of IEEE and a member of ACM, SIAM, and ASEE.

Thursday, August 20, 2009

Beyond Asimov: The Three Laws of Responsible Robotics

by Robin R. Murphy and David D. Woods

IEEE Intelligent Systems, July/August 2009, pp. 14–20

Since their codification in 1947 in the collection of short stories I, Robot, Isaac Asimov’s three laws of robotics have been a staple of science fiction. Most of the stories assumed that the robot had complex perception and reasoning skills equivalent to a child and that robots were subservient to humans. Although the laws were simple and few, the stories attempted to demonstrate just how difficult they were to apply in various real-world situations. In most situations, although the robots usually behaved "logically," they often failed to do the "right" thing, typically because the particular context of application required subtle adjustments of judgment on the part of the robot (for example, determining which law took priority in a given situation, or what constituted helpful or harmful behavior).

[The full article]

Lab Meeting August 24, 2009(Chung-Han) : Monitoring an intersection using a network of laser scanners

Monitoring an intersection using a network of laser scanners

Huijing Zhao; Jinshi Cui; Hongbin Zha; Katabira, K.; Xiaowei Shao; Shibasaki, R.

Intelligent Transportation Systems, 2008. ITSC 2008. 11th International IEEE Conference on12-15 Oct. 2008 Page(s):428 - 433

Abstract : In this research, a novel system for monitoring an intersection using a network of single-row laser range scanners (subsequently abbreviated as "laser scanner") is proposed. Laser scanners are set on the road side to profile an intersection horizontally from different viewpoints. This is done so that cross sections of the intersection are captured at a high scanning rate (e.g., 37 Hz) and to contain the contour points of the moving objects entering the intersection. Different laser scanners data are integrated into a common spatial-temporal coordinate system and processed. Thus, the moving objects inside the intersection are detected and tracked to estimate their state parameters, such as: location, speed, and direction at each time instance. An experiment was conducted in central Beijing, where six laser scanners were used to cover a three-way intersection. A digital copy of the dynamic intersection was measured, and, through data processing, a large quantity of physical dimension traffic data was obtained.

[link]

Wednesday, August 19, 2009

Lab Meeting August 24, 2009(Jimmy): One-Shot Learning of Object Categories

Title: One-Shot Learning of Object Categories

In: IEEE Trans. Pattern Analysis and Machine Intelligence, Vol. 28, No. 4, April 2006

Li Fei-Fei, Rob Fergus, and Pietro Perona

Abstract: Learning visual models of object categories notoriously requires hundreds or thousands of training examples. We show that it is possible to learn much information about a category from just one, or a handful, of images. The key insight is that, rather than learning from scratch, one can take advantage of knowledge coming from previously learned categories, no matter how different these categories might be. We explore a Bayesian implementation of this idea. Object categories are represented by probabilistic models. Prior knowledge is represented as a probability density function on the parameters of these models. The posterior model for an object category is obtained by updating the prior in the light of one or more observations. We test a simple implementation of our algorithm on a database of 101 diverse object categories. We compare category models learned by an implementation of our Bayesian approach to models learned from by Maximum Likelihood (ML) and Maximum A Posteriori (MAP) methods. We find that on a database of more than 100 categories, the Bayesian approach produces informative models when the number of training examples is too small for other methods to operate successfully.

[Link]

Thursday, August 13, 2009

Lab Meeting August 17, 2009 (Any): RANSAC-based DARCES

RANSAC-Based DARCES: A New Approach to Fast Automatic Registration of Partially Overlapping Range Images

IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), Vol. 21, No. 11, November 1999

Chu-Song Chen, Yi-Ping Hung, and Jen-Bo Cheng

Abstract: In this paper, we propose a new method, the RANSAC-based DARCES method, which can solve the partially overlapping 3D registration problem without any initial estimation. For the noiseless case, the basic algorithm of our method can guarantee that the solution it finds is the true one, and its time complexity can be shown to be relatively low. An extra characteristic is that our method can be used even for the case that there are no local features in the 3D data sets.

Link

Lab Meeting August 17, 2009(Kuen-Han): Moving Obstacle Detection in Highly Dynamic Scenes(ICRA09)

Title:Moving Obstacle Detection in Highly Dynamic Scenes(ICRA09)
Authors:A. Ess, B. Leibe, K. Schindler, and L. van Gool.

Abstract

We address the problem of vision-based multipersontracking in busy pedestrian zones using a stereo rigmounted on a mobile platform. Specifically, we are interestedin the application of such a system for supporting pathplanning algorithms in the avoidance of dynamic obstacles.
The complexity of the problem calls for an integrated solution, whichextracts as much visual information as possible and combinesit through cognitive feedback.

We propose such an approach,which jointly estimates camera position, stereo depth, objectdetections, and trajectories based only on visual information.The interplay between these components is represented in agraphical model. For each frame, we first estimate the groundsurface together with a set of object detections. Based onthese results, we then address object interactions and estimatetrajectories. Finally, we employ the tracking results to predictfuture motion for dynamic objects and fuse this informationwith a static occupancy map estimated from dense stereo.

The approach is experimentally evaluated on several longand challenging video sequences from busy inner-city locationsrecorded with different mobile setups. The results show thatthe proposed integration makes stable tracking and motionprediction possible, and thereby enables path planning incomplex and highly dynamic scenes.

link

webpage

NewScientist: Why humans can't navigate out of a paper bag

12 August 2009 by Chris Berdik
Magazine issue 2721.
[Full Article]

This is an interesting article mentioning that human beings do not perform metric-SLAM and may perform topological SLAM poorly. Take a look.

-Bob

Monday, August 10, 2009

IJCAI 2009 Talk: From Low-level Sensors to High-level Intelligence: Activity Recognition Links the Knowledge Food Chain

Title: From Low-level Sensors to High-level Intelligence: Activity Recognition Links the Knowledge Food Chain (IJCAI 2009)

Author: Qiang Yang, The Hong Kong University of Science and Technology

Description
Sensors provide computer systems with a window to the outside world. Activity recognition "sees" what is in the window to predict the locations, trajectories, actions, goals and plans of humans and objects. Building an activity recognition system requires a full range of interaction from statistical inference on lower level sensor data to symbolic AI at higher levels, where prediction results and acquired knowledge are passed up each level to form a knowledge food chain. In this talk, I will give an overview of activity recognition and explore its relation to other fields, including planning and knowledge acquisition, machine learning and Web search. I will also describe its applications in assistive technologies, security monitoring and mobile commerce.

Link

Sunday, August 09, 2009

Lab Meeting August 10, 2009(Nicole):Learning Sound Location from a Single Microphone (ICRA 2009)

Title:Learning Sound Location from a Single Microphone (ICRA 2009)

Authors:AshutoshSaxena and AndrewY. Ng

Abstract:

We consider the problem of estimating the incident angle of a sound, using only a single microphone. The ability to perform monaural (single-ear) localization isimportant to many animals; indeed, monaural cues are also the primary method by which humans decide if a sound comes from the front or back, as well as estimate its elevation. Such monaural localization is made possible by the structure of the pinna (outer ear), which modiﬁes sound in a way that is dependent on its incident angle. In this paper, we propose a machine learning approach to monaural localization, using only a single microphone and an “artiﬁcial pinna” (that distorts sound in a direction-dependent way). Our approach models the typical distribution of natural and artiﬁcial sounds, as well as the direction-dependent changes to sounds induced by the pinna. Our experimental results also show that the algorithm is able to fairly accurately localize a wide range of sounds, such as human speech, dog barking, waterfall, thunder, and so on. In contrast to microphone arrays, this approach also offers the potential of signiﬁcantly more compact, as well as lower cost and power, devices for sounds localization.

[link]

Friday, August 07, 2009

Paper: Modeling Groups of Plausible Virtual Pedestrians

by Christopher Peters and Cathy Ennis
IEEE Computer Graphics and Applications, July/August 2009, pp. 54–63

Crowd simulation is enjoying considerable success in a number of applied domains, most notably in evacuation scenarios in which simulated crowd behaviors can help improve the safety of interior building designs. However, not all applications involving virtual populace have the overarching goal of realistic simulation. In many cases, it's necessary only that viewers perceive the crowd as realistic. In many of the latest movies or video games involving large numbers of virtual actors, liberties can be taken in displaying those far away or otherwise obscured from the eye, if this doesn't noticeably diminish the viewing experience. For example, such simulations can reduce the level of detail or forgo collision avoidance calculations to allow simulation of a larger crowd or enhanced behaviors for individuals deemed most likely to occupy viewers' attention. (Full PDF)

FRC Seminar Special Time:Information Sharing in Large Heterogeneous Teams, August 13, 2009

FRC Seminar

August 13, 2009 12pm

Speaker:

Prasanna Velagapudi
PhD Student
Robotics Institute
Carnegie Mellon University

Abstract:

In large, collaborative, heterogeneous teams, team members often collect information that is useful to other members of the team. However, in many real domains, it is impossible to completely share all of this information due to network and processing constraints. Recognizing the utility of such information and delivering it efficiently across a team has been the focus of much research, with proposed approaches ranging from flooding to complex filters and matchmakers. Interestingly, random forwarding of information has been found to be a surprisingly effective information sharing approach in some domains. In this talk, we investigate some recent results supporting this phenomenon in detail and show that in certain systems, random forwarding of information performs almost half as well as a globally optimal approach. From this, we demonstrate a statistical modeling approach designed to estimate information sharing performance in real domains. Finally, we will discuss ongoing work and possible applications of these models in enabling heterogeneous teams to be scaled into the 100s and 1000s.

Speaker Bio:

Prasanna is currently pursuing a PhD at the Robotics Institute, and is co-advised by Katia Sycara and Paul Scerri. His research focuses on information sharing in large heterogeneous teams and large-scale human-robot interaction. Previously, he worked as an electrical engineer at RedZone Robotics, developing on power systems and embedded computing for submersible, subterranean mapping and mobility applications. Prasanna holds a B.S. in Electrical and Computer Engineering and Computer Science from Carnegie Mellon University.

Wednesday, August 05, 2009

MIT CSAIL Thesis Defense: Visual Sense Disambiguation: A Multimodal Approach

Speaker: Kate Saenko, MIT CSAIL

Date: Friday, August 7 2009

Time: 2:00PM to 3:00PM

Refreshments: 1:45PM

Location: Star conference room

Contact: Kate Saenko, (617) 669-9093, saenko@mit.edu

If a picture is worth a thousand words, can a thousand words be worth a training image? Most successful object recognition algorithms require manually annotated images of objects to be collected for training. The amount of human effort required to collect training data has limited most approaches to the several hundred object categories available in the labeled datasets. While human-annotated image data is scarce, additional sources of information can be used as weak labels, reducing the need for human supervision. In this thesis, we use three types of information to learn models of object categories: speech, text and dictionaries. We demonstrate that our use of non-traditional information sources facilitates automatic acquisition of visual object models for arbitrary words without requiring any labeled image examples.

Spoken object references occur in many scenarios: interaction with an assistant robot, voice-tagging of photos, etc. Existing reference resolution methods are unimodal, relying either only on image features, or only on speech recognition. We propose a method that uses both the image of the object and the speech segment referring to it to disambiguate the underlying object label. We show that even noisy speech input helps visual recognition, and vice versa. We also explore two sources of linguistic sense information: the words surrounding images on web pages, and dictionary entries for nouns that refer to objects. Keywords that index images on the web have been used as weak object labels, but these tend to produce noisy datasets with many unrelated images. We use unlabeled text, dictionary definitions, and semantic relations between concepts to learn a refined model of image sense. Our model can work with as little supervision as a single English word. We apply this model to a dataset of web images indexed by polysemous keywords, and show that it improves both retrieval of specific senses, and the resulting object classifiers.

Link

Tuesday, August 04, 2009

Lab Meeting August 10, 2009(Gary):In Between 3D Active Appearance Models and 3D Morphable Models

Title:In Between 3D Active Appearance Models and 3D Morphable Models
Authors: Jingu Heo, Marios Savvides

Abstract:
In this paper we propose a novel method of generating
3D Morphable Models (3DMMs) from 2D images. We
develop algorithms of 3D face reconstruction from a
sparse set of points acquired from 2D images. In order
to establish correspondence between images precisely, we
Combined Active Shape Models (ASMs) and Active Appearance
Models (AAMs)(CASAAMs) in an intelligent way,
showing improved performance on pixel-level accuracy
and generalization to unseen faces. The CASAAMs are
applied to the images of different views of the same person
to extract facial shapes across pose. These 2D shapes are
combined for reconstructing a sparse 3D model. The point
density of the model is increased by the Loop subdivision
method, which generates new vertices by a weighted sum
of the existing vertices. Then, the depth of the dense 3D
model is modified with an average 3D depth-map in order
to preserve facial structure more realistically. Finally, all
249 3D models with expression changes are combined
to generate a 3DMM for a compact representation. The
first session of the Multi-PIE database, consisting of 249
persons with expression and illumination changes, is used
for the modeling. Unlike typical 3DMMs, our model can
generate 3D human faces more realistically and efficiently
(2-3 seconds on P4 machine) under diverse illumination
conditions.

link

Sunday, August 02, 2009

Lab Meeting August 3rd, 2009 (Jeff): Angular problem after loop closing

I will introduce the angular problem after loop closing and discuss with you.

And try to propose some methods to slove this problem.