Monday, April 30, 2007

[VASC Seminar Series ]Image representations beyond histograms of gradients: The role of Gestalt descriptors

Speaker: Stanley Bileschi

Abstract:

Histograms of orientations and the statistics derived from them have
proven to be effective image representations for various recognition
tasks. In this work we attempt to improve the accuracy of object detection
systems by including new features that explicitly capture mid-level
gestalt concepts. Four new image features are proposed, inspired by the
gestalt principles of continuity, symmetry, closure and repetition. The
resulting image representations are used jointly with existing
state-of-the-art features and together enable better detectors for
challenging real-world data sets. As baseline features, we use Riesenhuber
and Poggio's C1 features [15] and Dalan and Triggs' Histogram of Oriented
Gradients feature [6]. Given that both of these baseline features have
already shown state of the art performance in multiple object detection
benchmarks, that our new midlevel representations can further improve
detection results warrants special consideration. We evaluate the
performance of these detection systems on the publicly available
StreetScenes [25] and Caltech101 [11] databases among others.

Related links:
news

Gestalt psychology

[News]ITRI Adopts Evolution Robotics Software Platform For Robotics Development

Abstract:

Identifying robotics as a key industry for growth, the Taiwan Ministry of Economic Affairs (MOEA) commissioned ITRI to create a national robotic SDK (software development kit) to provide an industry-standard platform for robotics. This software development kit will offer Taiwan’s corporations, labs and universities best-in-class tools for developing a broad range of new robotic products and applications, as well as provide a path for incorporating and sharing new technologies over time.

In deciding upon one of the most critical components for the robotics SDK, ITRI chose the Evolution Robotics’ ERSP Architecture to provide the standard foundation for all application development. In this role, the ERSP Architecture will essentially serve as the robot operating system, providing the key infrastructure and functions for integrating all of the hardware and software components.

full article

Thursday, April 26, 2007

Lab Meeting 26 April 2007 (Chihao): Audio Source localization and tracking: Property of Time delay of arrival & proposed method

compare the effect of different frame size
problem and proposed method

Lab Meeting 26 April 2007 (Jim): Probabilistic Appearance Based Navigation and Loop Closing

Probabilistic Appearance Based Navigation and Loop Closing
by Mark Cummins and Paul Newman, ICRA 2007

pdf, website

Abstract:
This paper describes a probabilistic framework for navigation using only appearance data. By learning a generative model of appearance, we can compute not only the similarity of two observations, but also the probability that they originate from the same location, and hence compute a pdf over observer location. We do not limit ourselves to the kidnapped robot problem (localizing in a known map), but admit the possibility that observations may come from previously unvisited places. The principled probabilistic approach we develop allows us to explicitly account for the perceptual aliasing in the environment – identical but indistinctive observations receive a low probability of having come from the same place. Our algorithm complexity is linear in the number of places, and is particularly suitable for online loop closure detection in mobile robotics.

Wednesday, April 25, 2007

Lab Meeting 26 April 2007 (Yu-Chun): Natural Emotion Expression of a Robot Based on Reinforcer Intensity

2007 IEEE International Conference on Robotics and Automation

Author:
Seung-Ik Lee, Gunn-Yong Park, and Joong-Bae Kim

Abstract:
An emotional robot is regarded as being able to express its diverse emotions in response to internal or external events. This paper presents a robot affective system that is able to express life-like emotions. In order to do that, the overall architecture of our affective system is based on neuroscience from which we obtained the natural emotional processing routines. Based on that architecture, we apply the reinforcer effects expecting that those would lead the affective system to be more similar to real-life’s emotion expression. The robot affective system has responsibility for gathering environmental information and evaluating which environmental stimuli are rewarding or punishing. The emotion processing involves with appraisal of the external and internal stimuli, such as homeostasis, and generates the affective states of the robot. Therefore, emotions are associated with the presentation, omission, and termination of the expected rewards or punishers (reinforcers). The experimental results show that our affective system can express several emotions simultaneously as well as the emotions decrease, increase, or changes to another emotion seamlessly as time passes.

Monday, April 23, 2007

CMU news: Fotowoosh technology

Freewebs licenses Fotowoosh technology, developed by Alexei Efros, Martial Hebert, and PhD student Derek Hoiem, that converts single, two-dimensional images into 3-D images.

Sunday, April 22, 2007

CMU Intelligence Seminar: An efficient way to learn deep generative models

An efficient way to learn deep generative models

Geoff Hinton, University of Toronto

I will describe an efficient, unsupervised learning procedure for deep generative models that contain millions of parameters and many layers of hidden features. The features are learned one layer at a time without any information about the final goal of the system. After the layer-by-layer learning, a subsequent fine-tuning process can be used to significantly improve the generative or discriminative performance of the multilayer network by making very slight changes to the features.

I will demonstrate this approach to learning deep networks on a variety of tasks including: Creating generative models of handwritten digits and human motion; finding non-linear, low-dimensional representations of very large datasets; and predicting the next word in a sentence. I will also show how to create hash functions that map similar objects to similar addresses, thus allowing hash functions to be used for finding similar objects in a time that is independent of the size of the database.

Speaker Bio

Geoffrey Hinton received his BA in experimental psychology from Cambridge in 1970 and his PhD in Artificial Intelligence from Edinburgh in 1978. He did postdoctoral work at Sussex University and the University of California San Diego and spent five years as a faculty member in the Computer Science department at Carnegie-Mellon University. He then became a fellow of the Canadian Institute for Advanced Research and moved to the Department of Computer Science at the University of Toronto. He spent three years from 1998 until 2001 setting up the Gatsby Computational Neuroscience Unit at University College London and then returned to the University of Toronto where he is a University Professor. He holds a Canada Research Chair in Machine Learning. He is the director of the program on "Neural Computation and Adaptive Perception" which is funded by the Canadian Institute for Advanced Research.

Geoffrey Hinton is a fellow of the Royal Society, the Royal Society of Canada, and the Association for the Advancement of Artificial Intelligence. He is an honorary foreign member of the American Academy of Arts and Sciences, and a former president of the Cognitive Science Society. He received an honorary doctorate from the University of Edinburgh in 2001. He was awarded the first David E. Rumelhart prize (2001), the IJCAI award for research excellence (2005), the IEEE Neural Network Pioneer award (1998) and the ITAC/NSERC award for contributions to information technology (1992).

Friday, April 20, 2007

[CMU RI Defense] Exploiting Space-Time Statistics of Videos for Face "Hallucination"

Title : Exploiting Space-Time Statistics of Videos for Face "Hallucination"

Author : Goksel Dedeoglu

Abstract :

Face "Hallucination" aims to recover high quality, high-resolution images of human faces from low-resolution, blurred, and degraded images or video. This thesis presents person-specific solutions to this problem through careful exploitation of space (image) and space-time (video) models. The results demonstrate accurate restoration of facial details, with resolution enhancements upto a scaling factor of 16.

The algorithms proposed in this thesis follow the analysis-by-synthesis paradigm; they explain the observed (low-resolution) data by fitting a (high-resolution) model. In this context, the first contribution is the discovery of a scaling-induced bias that plagues most model-to-image (or image-to-image) fitting algorithms. It was found that models and observations should be treated asymmetrically, both to formulate an unbiased objective function and to derive an accurate optimization algorithm. This asymmetry is most relevant to Face Hallucination: when applied to the popular Active Appearance Model, it leads to a novel face tracking and reconstruction algorithm that is significantly more accurate than state-of-the-art methods. The analysis also reveals the inherent trade-off between computational efficiency and estimation accuracy in low-resolution regimes.

The second contribution is a statistical generative model of face videos. By treating a video as a composition of space-time patches, this model efficiently encodes the temporal dynamics of complex visual phenomena such as eye-blinks and the occlusion or appearance of teeth. The same representation is also used to define a data-driven prior on a three-dimensional Markov Random Field in space and time. Experimental results demonstrate that temporal representation and reasoning about facial expressions improves robustness by regularizing the Face Hallucination problem.

The final contribution is an approximate compensation scheme against illumination effects. It is observed that distinct illumination subspaces of a face (each coming from a different pose and expression) still exhibit similar variation with respect to illumination. This motivates augmenting the video model with a low-dimensional illumination subspace, whose parameters are estimated jointly with high-resolution face details. Successful Face Hallucinations beyond the lighting conditions of the training videos are reported.

Full text of the thesis can be found here.
You can go here for more information.

Lab Meeting 26 April 2007 (AShin) :(ICRA 07)Outdoor Navigation of a Mobile Robot Using Differential GPS and Curb Detection

Title:Outdoor Navigation of a Mobile Robot Using Differential GPS and Curb Detection

Author:Seung-Hun Kim , Chi-Won Roh , Sung-Chul Kang and Min-Yong Park

Abstract— This paper demonstrates a reliable navigation ofa mobile robot in outdoor environment. We fuse differential GPS and odometry data using the framework of extended Kalman filter to localize a mobile robot. And also, we proposean algorithm to detect curbs through the laser range finder. Animportant feature of road environment is the existence of curbs.The mobile robot builds the map of the curbs of roads and themap is used for tracking and localization. The navigation systemfor the mobile robot consists of a mobile robot and a controlstation. The mobile robot sends the image data from a camera tothe control station. The control station receives and displays theimage data and the teleoperator commands the mobile robotbased on the image data. Since the image data does not containenough data for reliable navigation, a hybrid strategy forreliable mobile robot in outdoor environment is suggested.When the mobile robot is faced with unexpected obstacles or thesituation that, if it follows the command, it can happen to collide,it sends a warning message to the teleoperator and changes themode from teleoperated to autonomous to avoid the obstacles byitself. After avoiding the obstacles or the collision situation, themode of the mobile robot is returned to teleoperated mode. Wehave been able to confirm that the appropriate change ofnavigation mode can help the teleoperator perform reliablen avigation in outdoor environment through experiments in theroad.

CMU ML lunch: Learning without the loss function

Speaker: John Langford, Yahoo! Research, http://hunch.net/~jl/

Title: Learning without the loss function

Abstract: When learning a classifier, we use knowledge of the loss of different choices on training examples to guide the choice of a classifier. An often incorrect assumption is embedded in this paradigm: the assumption that we know the loss of different choices. This assumption is often incorrect, and the talk is about the feasibility of (and algorithms for) fixing this.

One example where the assumption is incorrect is the ad placement problem. Your job is to choose relevant ads for a user given various sorts of context information. You can test success by displaying an ad checking if the user is interested in it. However, this test does _not_reveal what would have happened if a different ad was displayed. Restated, the "time always goes forward" nature of reality does not allow us to answer "What would have happened if I had instead made a different choice in the past?"

Somewhat surprisingly, this is _not_ a fundamental obstacle to application of machine learning. I'll describe what we know about learning without the loss function, and some new better algorithms for this setting.

Thursday, April 19, 2007

Lab Meeting 19 April 2007 (Nelson): Statistical Segment-RANSAC Localization and Moving Object Detection in Crowded Urban Area

detailed algorithms
working progress

Lab Meeting 19 May 2007 (ZhenYu) Stereovision with a Single Camera and Multiple Mirrors

Title : Stereovision with a Single Camera and Multiple Mirrors(ICRA2005)
Author : Mouaddib, E.M. Sagawa, R. Echigo, T. Yagi, Y.
Abstract:

You can create catadioptric omnidirectional stereovision using several mirrors with a single camera. These systems have interesting advantages, for instance in the case of mobile robot navigation and environment reconstruction. Our paper aims at estimating the” quality” of such stereovision system. What happens when the number of mirrors increases? Is it better to increase the base-line or to increase the number of mirrors? We propose some criteria and a methodology to compare different significant categories (seven): three already existing systems and four new designs that we propose. We also study and propose a global comparison between the best configurations.

[link]

SLAM product: a Web-based mapping system based on SRI International's SLAM technology

A message from SRI. -Bob

-----------------------------------
Hi.

We are pleased to announce the availability of a Web-based mapping system based on SRI International's SLAM technology called Karto™.

As part of the first step toward providing a complete suite of robotic navigation and exploration algorithms, we are making the Karto Logger Plug-In available as a free download at www.kartorobotics.com. The Karto Logger Plug-In allows robot developers to collect and log odometry and range data in a form that our mapping software can interpret to create maps of indoor environments. This can be done either in simulation or a on a real-world robotic platform. We currently support both the Player and Microsoft Robotic Studio platforms.

We welcome any and all feedback on our algorithms. Please feel free to contact us at kt-info@kartorobotics.com to tell us about your experiences with our mapping algorithm.

Sincerely,

The Karto™ Team
Software for Robot on the Move
http://www.kartorobotics.com

IEEE news: NEW PC SECURITY RECOGNIZES YOUR FACE

NEW PC SECURITY RECOGNIZES YOUR FACE

Forget passwords, Bioscrypt Inc. has recently introduced a USB-pluggable, 3-inch 3-D face recognition camera that can be used to authenticate users into computer systems. The new technology combines Bioscrypt's background as a provider of fingerprint-based biometric access controls with the advanced face imaging and recognition technology it acquired along with the A4Vision on March 14. Bioscrypt’s system will be called the VisionAccess 3D DeskCam and will work by casting a 40,000-point infrared mesh grid over the user’s face so that workers can log onto their computers, networks, and application with just a glance. For more information, visit: the link.

IEEE news: DRIVER-LESS SMART CARS TO HIT THE STREETS IN 15 YEARS

DRIVER-LESS SMART CARS TO HIT THE STREETS IN 15 YEARS

Once a pure figment of the Hollywood imagination, smart cars that operate autonomously are set to come to life in 15 years. Researchers at the University of Essex, Eastern England, are building a car using a standard remote control model, which will serve as a prototype for other researchers to develop their own smart cars. The cars will use a special type of computer software that will enable the car to recognize obstacles and make decisions. To read more, visit: the link.

Wednesday, April 18, 2007

[Lab meeting] 19 April 2007(Stanley)

I'll show the potential field method result.

Tuesday, April 17, 2007

[Machine Learning Lunchtime Chats]April 16, Pradeep Ravikumar, Sparsity recovery and structure learning

Title: Sparsity recovery and structure learning
Speaker: Pradeep Ravikumar

The sparsity pattern of a vector, the number and location of its non-zeros, is of importance in varied settings as structure learning in graphical models, subset selection in regression, signal denoising and compressed sensing. In particular, a key task in these settings is to estimate the sparsity pattern of an unknown ``parameter'' vector from a set of n noisy observations.

Suppose a sparse ``signal'' vector (edges in graphical models, covariates in regression) enters into linear combinations, and has observations which are functions of these linear combinations with added noise. The task is to identify the set of relevant signals from n such noisy observations. In graphical models for instance, the task in structure learning is to identify the set of edges from the noisy samples.

This is an intractable problem under general settings, but there has been a flurry of recent work on using convex relaxations and L1 penalties to recover the underlying sparse signal, and in particular, the sparsity pattern of the signal.

The tutorial will cover the basic problem abstraction, the applications in various settings and some general conditions under which the tractable methods ``work''. It will also focus in particular on the application setting of structure learning in graphical models.

Monday, April 16, 2007

ICRA '07 : Stereo-based Markerless Human Motion Capture for Humanoid Robot Systems

Title :
Stereo-based Markerless Human Motion Capture for Humanoid Robot Systems

Author :
, Aleˇs Ude, Tamim Asfour, and R¨udiger Dillmann

Abstract :
In this paper, we present an image-based markerless human motion capture system, intended for humanoid robot systems. The restrictions set by this ambitious goal are numerous. The input of the system is a sequence of stereo image pairs only, captured by cameras positioned at approximately eye distance. No artificial markers can be used to simplify the estimation problem. Furthermore, the complexity of all algorithms incorporated must be suitable for real-time application, which is maybe the biggest problem when considering the high dimensionality of the search space. Finally, the system must not depend on a static camera setup and has to find the initial configuration automatically. We present a system, which tackles these problems by combining multiple cues within a particle filter framework, allowing the system to recover from wrong estimations in a natural way. We make extensive use of the benefit of having a calibrated stereo setup. To reduce search space implicitly, we use the 3D positions of the hands and the head, computed by a separate hand and head tracker using a linear motion model for each entity to be tracked. With stereo input image sequences at a resolution of 320×240 pixels, the processing rate of our system is 15 Hz on a 3 GHz CPU. Experimental results documenting the performance of our system are available in form of several videos.

ICRA 07: Rao-Blackwellized Particle Filtering for Mapping Dynamic Environments

Rao-Blackwellized Particle Filtering for Mapping Dynamic Environments

Isaac Miller and Mark Campbell

Abstract:

A general method for mapping dynamic environments
using a Rao-Blackwellized particle filter is presented.
The algorithm rigorously addresses both data association and
target tracking in a single unified estimator. The algorithm
relies on a Bayesian factorization to separate the posterior into
1) a data association problem solved via particle filter and
2) a tracking problem with known data associations solved
by Kalman filters developed specifically for the ground robot
environment. The algorithm is demonstrated in simulation and
validated in the real world with laser range data, showing
its practical applicability in simultaneously resolving data
association ambiguities and tracking moving objects.

Link

ICRA 07: Probabilistic Sonar Scan Matching for Robust Localization.

Probabilistic Sonar Scan Matching for Robust Localization.

Antoni Burguera, Yolanda Gonz´alez and Gabriel Oliver

Abstract:

This paper presents a probabilistic framework toperform scan matching localization using standard time-offlightultrasonic sensors. Probabilistic models of the sensors aswell as techniques to propagate the errors through the modelsare also presented and discussed. A method to estimate themost probable trajectory followed by the robot according tothe scan matching and odometry estimations is also presented.Thanks to that, accurate robot localization can be performedwithout the need of geometric constraints. The experimentsdemonstrate the robustness of our method even in the presenceof large amounts of noisy readings and odometric errors.

Link