Saturday, November 29, 2008

CMU talk: 3-D Point Cloud Classification with Max-Margin Markov Networks

Speaker: Daniel Munoz (RI@CMU)
Venue: NSH 1507
Date: Monday, December 1, 2008

Title: 3-D Point Cloud Classification with Max-Margin Markov Networks

Point clouds extracted from laser range finders are hard to classify due to variable and noisy returns due to pose, occlusions, surface reflectance, and sensor type. Conditional Random Fields (CRFs) is a popular framework for performing contextual classification that produce improved and "smooth" classification over local classifiers. In this talk, I will present some recent extensions to the max-margin CRF model from Taskar et al. 2004 that is used in this application.

Friday, November 28, 2008

Lab Meeting Dezember 1st (Andi): Probabilistic Scheme for Laser Based Motion Detection

Authors: Roman Katz, Juan Nieto and Eduardo Nebot

Abstract—This paper presents a motion detection scheme using laser scanners mounted on a mobile vehicle. We propose a stable, yet simple motion detection scheme that can be used and improved with tracking and classification procedures. The salient contribution of the developed architecture is twofold. It proposes a spatio-temporal correspondence procedure based on a scan registration algorithm. The detection is cast as a probability decision problem that accounts for sensor noise and achieves robust classification. Probabilistic occlusion checking is finally performed to improve robustness. Experimental results show the performance of the proposed architecture under different settings in urban environments.

full paper

Tuesday, November 25, 2008

Lab Meeting December 1st, 2008 (Jimmy): Negative Information and Line Observations for Monte Carlo Localization

Title: Negative Information and Line Observations for Monte Carlo Localization

Authors: Todd Hester and Peter Stone

Localization is a very important problem in robotics and is critical to many tasks performed on a mobile robot. In order to localize well in environments with few landmarks, a robot must make full use of all the information provided to it. This paper moves towards this goal by studying the effects of incorporating line observations and negative information into the localization algorithm. We extend the general Monte Carlo localization algorithm to utilize observations of lines such as carpet edges. We also make use of the information available when the robot expects to see a landmark but does not, by incorporating negative information into the algorithm. We compare our implementations of these ideas to previous similar approaches and demonstrate the effectiveness of these improvements through localization experiments performed both on a Sony AIBO ERS-7 robot and in simulation.


Monday, November 24, 2008

CMU talkl: Machine Learning Problems in Computational Biology

Speaker: Eric Xing (Assistant Professor, ML@CMU)
Date: Monday, November 24, 2008

Some Challenging Machine Learning Problems in Computational Biology:
Time-Varying Networks Inference and Sparse Structured Input-Out Learning

Recent advances in high-throughput technologies such as microarrays and genome-wide sequencing have led to an avalanche of new biological data that are dynamic, noisy, heterogeneous, and high-dimensional. They have raised unprecedented challenges in machine learning and high-dimensional statistical analysis; and their close relevance to human health and social welfare has often created unique demands on performance metric different from standard data mining or pattern recognition problems. In this talk, I will discuss two of such problems. First, I will present a new statistical formalism for modeling network evolution over time, and several new algorithms based on temporal extensions of the sparse graphical logistic regression, for parsimonious reverse-engineering the latent time varying networks. I will show some promising results on recovering the latent sequence of temporally rewiring gene networks over more than 4000 genes during the life cycle of Drosophila melanogaster from microarray time course, at a time resolution only limited by sample frequency. Second, I will present a family of sparse structured regression models in the context of uncovering true associations between linked genetic variations (inputs) in the genome and networks of human traits (outputs) in the phenome. If time allows, I will also present another class of new models known as the maximum entropy discrimination Markov networks, which address the same problem in the maximum margin paradigm, but using a entropic regularizer that lead to a distribution of structured prediction functions that are simultaneously primal and dual sparse (i.e., with few support vectors, and of low effective feature dimension).

Joint work with Amr Ahmed, Seyoung Kim, Mladen Kolar, Le Song and Jun Zhu.

Thursday, November 20, 2008

CMU talk: The Capacity and Fidelity of Visual Long Term Memory

VASC Seminar
Monday, November 24, 2008

The Capacity and Fidelity of Visual Long Term Memory

Aude Oliva
Associate Professor of Cognitive Science
Department of Brain and Cognitive Sciences
Massachusetts Institute of Technology


The human visual system has been extensively trained to deal with objects and natural images, giving it the opportunity to develop robust strategies to quickly encode and recognize categories and exemplars. Although it is known that human memory capacity for images is massive, the fidelity with which human memory can represent such a large number of images is an outstanding question. We conducted three large-scale memory experiments to determine the details remembered per image representing object and natural scenes, by varying the amount of detail required to succeed in subsequent memory tests. Our results show that contrary to the commonly accepted view that long-term memory representations contain only the gist of what was seen, long-term memory can store thousands of items with a large amount of detail per item. Further analyzes reveal that memory for an item depends on the extent to which it is conceptually distinct from other items in the memory set, and not necessarily on the featural distinctiveness along shape or color dimensions. These findings suggest a “conceptual hook” is necessary for maintaining a large number of high-fidelity representations in visual long-term memory. Altogether, the results present a great challenge to models of object and natural scene recognition, which must be able to account for such a large and detailed storage capacity. Work in collaboration with: Timothy Brady, Talia Konkle and George Alvarez.

Aude Oliva is Associate Professor of Cognitive Science, in the Department of Brain and Cognitive Sciences, at the Massachusetts Institute of Technology. After a French baccalaureate in Physics and Mathematics and a B.Sc in Psychology, she received two M. Sc. degrees –in Experimental Psychology, and in Cognitive Science and Image Processing, and was awarded a Ph.D in Cognitive Science in 1995, from the Institut National Polytechnique of Grenoble, France. After postdoctoral research positions in the UK, Japan, France and US, she joined the MIT faculty in 2004. In 2006, she received a National Science Foundation CAREER award in Computational Neuroscience to pursue research in human and machine scene understanding.

Her research program is in the field of Computational Visual Cognition, a framework that strives to identify the substrates of complex visual and recognition tasks (using behavioral, eye tracking and imaging methods) and to develop models inspired by human cognition. Her current research focus lies in studying human abilities at natural image recognition and memory, including scene, object and space perception as well as the role of attentional mechanisms and learning in visual search tasks.

Wednesday, November 19, 2008

CMU RI Thesis Proposal: Probabilistic Reasoning with Permutations

Probabilistic Reasoning with Permutations: A Fourier-Theoretic Approach

Robotics Institute
Carnegie Mellon University

Permutations are ubiquitous in many real-world problems, such as voting, ranking, and data association. Representing uncertainty over permutations is challenging, since there are n! possibilities, and common factorized probability distribution representations, such as graphical models, are inefficient due to the mutual exclusivity constraints that are typically associated with permutations. 

This thesis explores a new approach for probabilistic reasoning with permutations based on the idea of approximating distributions using their low-frequency Fourier components. We use a generalized Fourier transform defined for functions on permutations, but unlike the widely used Fourier analysis on the circle or the real line, Fourier transforms of functions on permutations take the form of ordered collections of matrices. As we show, maintaining the appropriate set of low-frequency Fourier terms corresponds to maintaining matrices of simple marginal probabilities which summarize the underlying distribution. We show how to derive the Fourier coefficients of a variety of probabilistic models which arise in practice and that many useful models are either well-approximated or exactly represented by low-frequency (and in many cases, sparse) Fourier coefficient matrices. 

In addition to showing that Fourier representations are both compact and intuitive, we show how to cast common probabilistic inference operations in the Fourier domain, including marginalization, conditioning on evidence, and factoring based on probabilistic independence. The algorithms presented in this thesis are fully general and work gracefully in bandlimited settings where only a partial subset of Fourier coefficients is made available. 

From the theoretical side, we tackle several problems in understanding the consequences of the bandlimiting approximation. We present results in this thesis which illuminate the nature of error propagation in the Fourier domain and propose methods for mitigating their effects. 

Finally we demonstrate the effectiveness of our approach on several real datasets and show that our methods, in addition to being well-founded theoretically, are also scalable and provide superior results in practice.

Lab Meeting Novembel 24, 2008(ZhenYu): Reconstructing a 3D Line from a Single Catadioptric Image

Title: Reconstructing a 3D Line from a Single Catadioptric Image (3DPVT'06)

Authors: Lanman, Douglas; Wachs, Megan; Taubin, Gabriel; Cukierman, Fernando

This paper demonstrates that, for axial non-central optical systems, the equation of a 3D line can be estimated using only four points extracted from a single image of the line. This result, which is a direct consequence of the lack of vantage point, follows from a classic result in enumerative geometry: there are exactly two lines in 3-space which intersect four given lines in general position. We present a simple algorithm to reconstruct the equation of a 3D line from four image points. This algorithm is based on computing the Singular Value Decomposition (SVD) of the matrix of Pl¨ucker coordinates of the four corresponding rays. We evaluate the conditions for which the reconstruction fails, such as when the four rays are nearly coplanar. Preliminary experimental results using a spherical catadioptric camera are presented. We conclude by discussing the limitations imposed by poor calibration and numerical errors on the proposed reconstruction algorithm.


Lab Meeting Novembel 24, 2008(Chung-Han)SwisTrack - A Flexible Open Source Tracking Software for Multi-Agent Systems

Title: SwisTrack - A Flexible Open Source Tracking Software for Multi-Agent Systems

Authors: Thomas Lochmatter, Pierre Roduit, Chris Cianci, Nikolaus Correll, Jacques Jacot and Alcherio Martinoli

Vision-based tracking is used in nearly all roboticlaboratories for monitoring and extracting of agent positions,orientations, and trajectories. However, there is currently noaccepted standard software solution available, so many researchgroups resort to developing and using their own customsoftware. In this paper, we present Version 4 of SwisTrack,an open source project for simultaneous tracking of multipleagents. While its broad range of pre-implemented algorithmiccomponents allows it to be used in a variety of experimentalapplications, its novelty stands in its highly modular architecture.Advanced users can therefore also implement additionalcustomized modules which extend the functionality of theexisting components within the provided interface. This paperintroduces SwisTrack and shows experiments with both markedand marker-less agents.


Tuesday, November 18, 2008

CMU talk: Visual Localisation in Dynamic Non-uniform Lighting

Visual Localisation in Dynamic Non-uniform Lighting

Dr. Stephen Nuske
Postdoctoral Researcher
Field Robotics Center
Carnegie Mellon University

Thursday, November 20th

Abstract: For vision to succeed as a perceptual mechanism in general field robotic applications, vision systems must overcome the challenges presented by the lighting conditions. Many current approaches rely on decoupling the effects of lighting from the process, which is not possible in many situations -- not surprising considering an image is fundamentally an array of light measurements. This talk will describe two different visual localisation systems designed for two different field robot applications and were both designed to address the lighting challenges in their respective application environments.

The first visual localisation system discussed is for industrial ground vehicles operating outdoors. The system employs an invariant map combined with a robust localisation algorithm and an intelligent exposure control algorithm which together permit reliable localisation in a wide range of outdoor lighting conditions.

The second system discussed is for submarines navigating underwater structures, where the only light source is a spotlight mounted onboard the vehicle. The proposed system explicitly models the light source within the localisation framework which serves to predict the changing appearance of the structure. Experiments reveal that this system that understands the effects of the lighting can solve this difficult visual localisation scenario which conventional approaches struggle to solve.

The results of the two systems are encouraging, given the extremely challenging dynamic non-uniform lighting in each environment and both systems will continue to be developed with industry partners into the future.

Speaker Bio: Stephen's research is in vision systems for mobile robots, focusing on the creation of practical systems that can deal with the problems arising from dynamic non-uniform lighting conditions. Stephen began his undergraduate studies at the University of Queensland, Australia, in Software Engineering. His undergraduate thesis was on the vision system for the university's robot soccer team that placed second at the RoboCup in Portugal. During his undergraduate years he gained work experience at BSD Robotics; a company that develops equipment for automated medical laboratories. After receiving his undergraduate degree Stephen began a PhD based at the Autonomous Systems Laboraty at CSIRO in Australia. He has spent three months during his PhD at INRIA in Grenoble; a French national institute for computer science. Stephen is now starting a position here at CMU in the Field Robotics Center under Sanjiv Singh.

Lab Meeting November 24, 2008(Tiffany): Structure from Behavior in Autonomous Agents

Structure from Behavior in Autonomous Agents (IROS 2008)

Georg Martius, Katja Fiedler and J. Michael Herrmann

We describe a learning algorithm that generates behaviors by self-organization of sensorimotor loops in an autonomous robot. The behavior of the robot is analyzed by a multi-expert architecture, where a number of controllers compete for the data from the physical robot. Each expert stabilizes the representation of the acquired sensorimotor mapping in dependence of the achieved prediction error and forms eventually a behavioral primitive. The experts provide a discrete representation of the behavioral manifold of the robot and are suited to form building blocks for complex behaviors.


Saturday, November 15, 2008

CMU talk: Learning Language from its Perceptual Context

Joint Intelligence/LTI Seminar
November 21, 2008

Learning Language from its Perceptual Context
Raymond J. Mooney, University of Texas at Austin

Current systems that learn to process natural language require laboriously constructed human-annotated training data. Ideally, a computer would be able to acquire language like a child by being exposed to linguistic input in the context of a relevant but ambiguous perceptual environment. As a step in this direction, we present a system that learns to sportscast simulated robot soccer games by example. The training data consists of textual human commentaries on Robocup simulation games. A set of possible alternative meanings for each comment is automatically constructed from game event traces. Our previously developed systems for learning to parse and generate natural language (KRISP and WASP) were augmented to learn from this data and then commentate novel games. The system is evaluated based on its ability to parse sentences into correct meanings and generate accurate descriptions of game events. Human evaluation was also conducted on the overall quality of the generated sportscasts and compared to human-generated commentaries.

Raymond J. Mooney is a Professor in the Department of Computer Sciences at the University of Texas at Austin. He received his Ph.D. in 1988 from the University of Illinois at Urbana/Champaign. He is an author of over 150 published research papers, primarily in the areas of machine learning and natural language processing. He is the current President of the International Machine Learning Society, was program co-chair for the 2006 AAAI Conference on Artificial Intelligence, general chair of the 2005 Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing, and co-chair of the 1990 International Conference on Machine Learning. He is a Fellow of the American Association for Artificial Intelligence and recipient of best paper awards from the National Conference on Artificial Intelligence, the SIGKDD International Conference on Knowledge Discovery and Data Mining, the International Conference on Machine Learning, and the Annual Meeting of the Association for Computational Linguistics. His recent research has focused on learning for natural-language processing, text mining for bioinformatics, statistical relational learning, and transfer learning.

Friday, November 14, 2008

CMU talk: Rain in Vision and Graphics

Special VASC Seminar
Tuesday, November 18, 2008

Rain in Vision and Graphics
Kshitiz Garg

Rain produces sharp intensity fluctuations in images and videos which severely degrade the performance of outdoor vision systems. Considering that bad weather is common, a city like New York has bad weather 23% of time, it is important to remove the visual effects of rain to make outdoor vision robust. In contrast, in graphics, rain effects are desirable. They are often used in movies to convey scene emotions and in other graphics applications, such as games, to enhance realism. In this talk, I will present rain from the perspective of vision and graphics. I will show how physics based modeling of the visual appearance of rain leads to efficient algorithms both for handling its effects in vision and for its realistic rendering in graphics. I will also briefly discuss some of the recent projects I have done on recognition and tracking at intuVision.

Kshitiz Garg is a research scientist and software developer at intuVision. His research interests are in the areas of computer vision, pattern recognition and computer graphics. He has a Masters in Physics and a PhD. in Computer Science from Columbia University, NY. He specializes in physics-based modeling and algorithm development. During his graduate work he developed physics based models for the intensity fluctuations produced by rain in images. He is also interested in Computer Graphics and has developed efficient algorithms for realistic rendering of rain. Since joining the intuVision team, he has worked on algorithms to improve object tracking and recognition especially in the presence of background motion, illumination changes and shadows. He is the research lead for development of intuVision's object classification, face detection, and soft biometry algorithms.

Thursday, November 13, 2008

CMU talk: Techniques for Learning 3D Maps

Title: Techniques for Learning 3D Maps

Dr. Wolfram Burgard
Dept. of Computer Science
University of Freiburg

Monday, November 17th

Abstract: Learning maps is a fundamental aspect in mobile robotics, as maps support various tasks including path planning and localization. Whereas the problem of learning maps has been extensively studied for indoor settings, novel field robotics projects have substantially increased the interest in effective representations of outdoor environments. In this talk, we will present our recent results in learning highly accurate multi-level surface maps, which are an extension of elevation maps towards multiple levels. We will describe how multi-level surface maps can be utilized for motion planning and localization. We present an application, in which Junior, the DARPA Grand Challenge entry robot of Stanford University, autonomously drives through a large parking garage and carries out an autonomous parking maneuver. Finally, we will briefly describe our approaches to learning surface maps using variants of Gaussian Processes.

Speaker Bio: Wolfram Burgard is an associate professor for computer science at the University of Freiburg where he heads of the Laboratory for Autonomous Intelligent Systems. He received his Ph.D.~degree in Computer Science from the University of Bonn in 1991. His areas of interest lie in artificial intelligence and mobile robots. Over the past years his research mainly focused on the development of robust and adaptive techniques for state estimation and control of autonomous mobile robots. He and his group developed several innovative probabilistic techniques for robot navigation and control. They cover different aspects such as localization, map-building, path-planning, and exploration.

Tuesday, November 11, 2008

CMU RI Thesis Proposal: Geolocation from Range: Robustness, Efficiency and Scalability

Robotics Institute
Carnegie Mellon University

In this thesis I explore the topic of geolocation from range. A robust method for localization and SLAM (Simultaneous Localization and Mapping) is proposed. This method uses a polar parameterization of the state to achieve accurate estimates of the nonlinear and multi-modal distributions in range-only systems. Several experimental evaluations on real robots reveal the reliability of this method. 

Scaling such a system to large network of nodes, increases the computational load on the system due to the increased state vector. To alleviate this problem, we propose the use of a distributed estimation algorithm based on the belief propagation framework. This method distributes the estimation task, such that each node only estimates its local network, greatly reducing the computation performed by any individual node. However, the method does not provide any guarantees on the convergence of its solution in general graphs. Convergence is only guaranteed for non-cyclic graphs (ie. trees). Thus, I propose to formulate an extension to this approach that provides guarantees on its convergence and an improved approximation of the true graph inference problem

Scaling in the traditional sense involves extensions to deal with growth in the size of the operating environment. In large, feature-less environments, maintaining a globally consistent estimate of a group of mobile agents is difficult. In this thesis, I propose the use of a multi-robot coordination strategy to achieve the tight coordination necessary to obtain an accurate global estimate. The proposed approach will be demonstrated using both simulation and experimental testing with real robots.

Monday, November 10, 2008

Lab Meeting November 10, 2008 (Yu-chun): “Try something else!” — When users change their discursive behavior in human-robot interaction

ICRA 2008

Manja Lohse, Katharina J. Rohlfing, Britta Wrede, and Gerhard Sagerer

This paper investigates the influence of feedback provided by an autonomous robot (BIRON) on users' discursive behavior. A user study is described during which users show objects to the robot. The results of the experiment indicate, that the robot's verbal feedback utterances cause the humans to adapt their own way of speaking. The changes in users' verbal behavior are due to their beliefs about the robots knowledge and abilities. In this paper they are identified and grouped. Moreover, the data implies variations in user behavior regarding gestures. Unlike speech, the robot was not able to give feedback with gestures. Due to the lack of feedback, users did not seem to have a consistent mental representation of the robot's abilities to recognize gestures. As a result, changes between different gestures are interpreted to be unconscious variations accompanying speech.

Sunday, November 09, 2008

Lab Meeting November 10, 2008 (Alan): An image-to-map loop closing method for monocular SLAM (IROS 2008)

Title: An image–to–map loop closing method for monocular SLAM
Authors: Brian Williams, Mark Cummins, Jos´e Neira, Paul Newman, Ian Reid and Juan Tard´os

Abstract: In this paper we present a loop closure method for a handheld single–camera SLAM system based on our previous work on relocalisation. By finding correspondences between the
current image and the map, our system is able to reliably detect loop closures. We compare our algorithm to existing techniques for loop closure in single–camera SLAM based on both image–
to–image and map–to–map correspondences and discuss both the reliability and suitability of each algorithm in the context of monocular SLAM.


Saturday, November 08, 2008

Lab Meeting November 10, 2008 (Any): Efficiently Learning High-dimensional Observation Models for Monte-Carlo Localization using Gaussian Mixtures

Title: Efficiently Learning High-dimensional Observation Models for Monte-Carlo Localization using Gaussian Mixtures
Authors: Patrick Pfaff, Cyrill Stachniss, Christian Plagemann, and Wolfram Burgard
Abstract: Whereas probabilistic approaches are a powerful tool for mobile robot localization, they heavily rely on the proper definition of the so-called observation model which defines the likelihood of an observation given the position and orientation of the robot and the map of the environment. Most of the sensor models for range sensors proposed in the past either consider the individual beam measurements independently or apply uni-modal models to represent the likelihood function. In this paper, we present an approach that learns place-dependent sensor models for entire range scans using Gaussian mixture models. To deal with the high dimensionality of the measurement space, we utilize principle component analysis for dimensionality reduction. In practical experiments carried out with data obtained from a real robot, we demonstrate that our model substantially outperforms existing and popular sensor models.

Friday, November 07, 2008

A $1 Recognizer for User Interface Prototypes

It requires under 100 lines of easy code and achieves 97% recognition rates with only one template defined for each gesture below. With 3+ templates defined, accuracy exceeds 99%. Gestures should be regarded as fully rotation, scale, and position invariant.

CMU VASC Seminar: What does the sky tell us about the camera?

What does the sky tell us about the camera?
Jean-Francois Lalonde
Robotics Institute, Carnegie Mellon

VASC Seminar
Monday, November 10

Abstract: As the main observed illuminant outdoors, the sky is a rich source of information about the scene. However, it is yet to be fully explored in computer vision because its appearance in an image depends on the sun position, weather conditions, photometric and geometric parameters of the camera, and the location of capture. In this talk, I will present an analysis of two sources of information available within the visible portion of the sky region: the sun position, and the sky appearance. By fitting a model of the predicted sun position to an image sequence, we show how to extract camera parameters such as the focal length, and the zenith and azimuth angles. Similarly, we show how we can extract the same parameters by fitting a physically-based sky model to the sky appearance. In short, the sun and the sky serve as geometric calibration targets, which can be used to annotate a large database of image sequences. We use our methods to calibrate 22 real, low-quality webcam sequences scattered throughout the continental US, and show deviations below 4% for focal length, and 3 degrees for the zenith and azimuth angles. Once the camera parameters are recovered, we use them to define a camera-invariant sky appearance model, which we exploit in two applications: 1) segmentation of the sky and cloud layers, and 2) data-driven sky matching across different image sequences based on a novel similarity measure defined on sky parameters. This measure, combined with a rich appearance database, allows us to model a wide range of sky conditions.

Bio: Jean-Francois Lalonde received his B.E. in Computer Engineering from Laval University, Canada in 2004. He received his M.S. in Robotics from Carnegie Mellon University in 2006 under Martial Hebert, and he has been a Robotics Ph.D. student advised by Alexei A. Efros in that institution since. His research interests are in computer vision and computer graphics, focusing on image understanding and synthesis
leveraging large amounts of data.

Wednesday, November 05, 2008

CMU talk: Computing with Language and Context over Time

Speaker: Gregory Aist, Arizona State University

Title: Computing with Language and Context over Time

What: Joint LTI/RI Seminar
When: Friday November 7, 2008, 2:00pm - 3:00pm
Where: 1305 NSH

How do language and context interact in learning and performance by humans and machines? To explore this broad area of inquiry, I have studied interactions between natural language and a wide range of different contexts: visual context, social and team context, written context and world knowledge, procedure and task context, dialogue and temporal context, and instructional context. Specific research questions have included how machines can process spoken language continuously and integrate speech and visual context during understanding; how computers can help pilots and astronauts learn and perform tasks; and how to automatically generate, present, and evaluate the effects of vocabulary help for children. One key challenge in addressing all of these questions is to model and compute representations of language and context that unfold over time as the interaction progresses. This talk will illustrate the need for such interactive time-sensitive processes, describe computational approaches to understanding language and context as dialogue and interactions unfold across time, and evaluate the effectiveness of such approaches.

Short bio:
Gregory Aist is currently at Arizona State University as an Assistant Research Professor in the School of Computing and Informatics and the Applied Linguistics Program. His research interests are in natural language processing and computer-assisted learning. His research addresses fundamental issues in language and learning, tackles computational challenges of automatic processing of human language and computer support for human learning, and is applied to provide users with learning experiences and new capabilities in authentic settings for educational domains such as traditional literacy (reading and writing) and new literacies (virtual worlds), and physical domains such as aerospace and human-robot interaction. During summers 2007 and 2008 he was an Air Force Summer Faculty Fellow. Previously he has held research and visiting positions at the University of Rochester, RIACS/NASA Ames Research Center, and the MIT Media Lab. He received a Ph.D. in Language and Information Technology from Carnegie Mellon University in 2000, where he was an NSF Graduate Fellow.

Sunday, November 02, 2008

Lab Meeting November 3rd, 2008 (swem): Learning Patch Correspondences for Improved Viewpoint Invariant Face Recognition

Title: Learning Patch Correspondences for Improved Viewpoint Invariant Face Recognition

Author: Ahmed Bilal Ashraf, Simon Lucey, Tsuhan Chen

Variation due to viewpoint is one of the key challenges
that stand in the way of a complete solution to the face
recognition problem. It is easy to note that local regions of
the face change differently in appearance as the viewpoint
varies. Recently, patch-based approaches, such as those of
Kanade and Yamada, have taken advantage of this effect resulting
in improved viewpoint invariant face recognition. In
this paper we propose a data-driven extension to their approach,
in which we not only model how a face patch varies
in appearance, but also how it deforms spatially as the viewpoint
varies. We propose a novel alignment strategy which
we refer to as “stack flow” that discovers viewpoint induced
spatial deformities undergone by a face at the patch level.
One can then view the spatial deformation of a patch as
the correspondence of that patch between two viewpoints.
We present improved identification and verification results
to demonstrate the utility of our technique.


Lab Meeting November 3rd, 2008 (Shao-Chen): Blind spatial subtraction array with independent component analysis for hands-free speech recognition

Blind spatial subtraction array with independent component analysis for hands-free speech recognition

Yu Takahashi, Tomoya Takatani, Hiroshi Saruwatari and Kiyohiro Shikano

In this paper, we propose a new blind spatial subtraction array (BSSA) which contains an accurate noise estimator based on independent component analysis (ICA) to realize a noise-robust hands-free speech recognition. First, a preliminary experiment suggests that the conventional ICA is proficient in the noise estimation rather than the direct speech estimation in real environments, where the target speech can be approximated to a point source but real noises are often not point sources. Secondly, based on the above-mentioned findings, we propose a new noise reduction method which is implemented in subtracting the power spectrum of the estimated noise by ICA from the power spectrum of noise-contaminated observations. This architecture provides us with a noise-estimation-error robust speech enhancement which is well applicable to the speech recognition. Finally, the effectiveness of the proposed BSSA is shown in the speech recognition experiment.