Robot Perception and Learning: December 2009

Monday, December 28, 2009

Lab Meeting December 30th, 2009 (Nicole) : Estimation of Sound Source Number and Directions under a Multi-source Environment (IROS 2009)

Title: Estimation of Sound Source Number and Directions under a Multi-source Environment (IROS 2009)

Authors: Jwu-Sheng Hu, Member IEEE, Chia-Hsing Yang, Student Member IEEE, and Cheng-Kang Wang

Abstract:

Sound source localization is an important featurein robot audition. This work proposes a sound source numberand directions estimation method by using the delayinformation of microphone array. An eigenstructure-basedgeneralized cross correlation method is proposed to estimatetime delay between microphones. Upon obtaining the time delayinformation, the sound source direction and velocity can beestimated by least square method. In multiple sound source case,the time delay combination among microphones is arrangedsuch that the estimated sound speed value falls within anacceptable range. By accumulating the estimation results ofsound source direction and using adaptive K-means++algorithm, the sound source number and directions can beestimated.

[link]

Sunday, December 27, 2009

Measuring the Accuracy of Distributed Algorithms on Multi-Robot Systems

Measuring the Accuracy of Distributed Algorithms on Multi-Robot Systems

James McLurkin (UW CSE postdoc, MIT)
October 9, 2008, 3:30 pm
EE-105

Abstract
Distributed algorithms running on multi-robot systems rely on ad-hoc networks to relay messages throughout the group. The propagation speed of these messages is large, but not infinite, and problems in algorithm execution can arise when the robot speed is a large fraction of the message propagation speed. This implies a robot speed limit, as any robot moving away from a message source faster than the message speed will never receive new information, and no algorithm can function properly on it. In this work, we focus on measuring the accuracy of multi-robot distributed algorithms. We define the Robot Speed Ratio (RSR) as the ratio of robot speed to message speed. We express it in a form that is platform-independent and captures the relationship between communications usage, robot mobility, and algorithm accuracy. We show that trade-offs between these key quantities can be balanced at design time. Finally, we present results from experiments with 50 robots that characterize the accuracy of preexisting distributed algorithms for network communication, navigation, boundary detection, and dynamic task assignment. In all cases, accuracy degrades as speed increases or communication bandwidth is reduced. In our experiments, a RSR of 0.005 allows good accuracy in all algorithms, a RSR of 0.02 allows reasonable accuracy in simple algorithms, and all algorithms tested are essentially useless at a RSR of 0.10 or higher.

[link]

Friday, December 25, 2009

Lab Meeting December 30th, 2009(Gary) : Pose Robust Face Tracking by Combining Active Appearance Models and Cylinder Head Models

Title: Pose Robust Face Tracking by Combining Active Appearance Models and Cylinder Head Models (IJCV 2008)

Author : Jaewon Sung , Takeo Kanade , Daijin Kim

Abstract:

The active appearance models (AAMs) provide
the detailed descriptive parameters that are useful for various
autonomous face analysis problems. However, they are
not suitable for robust face tracking across large pose variation
for the following reasons. First, they are suitable for
tracking the local movements of facial features within a limited
pose variation. Second, they use gradient-based optimization
techniques for model fitting and the fitting performance
is thus very sensitive to initial model parameters.
Third, when their fitting is failed, it is difficult to obtain
appropriate model parameters to re-initialize them. To alleviate
these problems, we propose to combine the active
appearance models and the cylinder head models (CHMs),
where the global head motion parameters obtained from the
CHMs are used as the cues of the AAM parameters for a
good fitting or re-initialization. The good AAM parameters
for robust face tracking are computed in the following manner.
First, we estimate the global motion parameters by the
CHM fitting algorithm. Second, we project the previously
fitted 2D shape points onto the 3D cylinder surface inversely
Third, we transform the inversely projected shape points by
the estimated global motion parameters. Fourth, we project
the transformed 3D points onto the input image and computed
the AAM parameters from them. Finally, we treat the
computed AAM parameters as the initial parameters for the
fitting. Experimental results showed that face tracking combining
AAMs and CHMs is more pose robust than that of
AAMs in terms of 170% higher tracking rate and the 115%
wider pose coverage.

link

Thursday, December 24, 2009

NTU talk: Human Action Recognition Using Bag of Video Words

Title: Human Action Recognition Using Bag of Video Words
Speaker: Dr. Mubarak Shah, Agere Chair Professor of Computer Science, University of Central Florida
Time: 4:00pm, Dec 24 (Thu), 2009
Place: Room 210, CSIE building

Abstract:

The traditional approach for video analysis involves detection of objects, followed by tracking of objects from frame to frame and finally analysis of tracks for human action recognition. However, in some videos of complex scenes it is not possible to reliably detect and track objects. Therefore, recently in computer vision there has been lots of interest in the bag of video words approach, which bypasses the object detection and tracking steps. In bag of video words approach an action is described by a distribution of spatiotemporal cuboids (3D interest points).

In this talk, first I will describe a method to automatically discover the optimal number of video words clusters by utilizing the Maximization of Mutual Information (MMI). Unlike the k-means algorithm which is typically used to cluster spatiotemporal cuboids into video words based on their appearance similarity, MMI clustering further groups the video-words, such that the semantically similar video-words, e.g. words corresponding to the same part of the body during an action, are grouped in the same cluster.

The above method for human action recognition uses only one kind of features, spatiotemporal cuboids. However, single feature based representation for human action is not sufficient to capture the imaging variations (view-point, illumination etc.) and attributes of individuals (size, age, gender etc.).

Next I will present a method which uses two types of features: i) a quantized vocabulary of local spatio-temporal (ST) volumes (or cuboids), and ii) a quantized vocabulary of spin-images. To optimally combine these features, we treat different features and videos as nodes in a graph, where weighted edges between the nodes represent the strength of the relationship between entities. The graph is then embedded into a k-dimensional space subject to the criteria that similar nodes have Euclidian coordinates which are closer to each other. This is achieved by converting this constraint into a minimization problem whose solution is the eigenvectors of the graph Laplacian matrix. This procedure is known as Fiedler Embedding.

Short Biography:
Dr. Mubarak Shah, Agere Chair Professor of Computer Science, is the founding director of the Computer Visions Lab at UCF. He is a co-author of three books (Motion-Based Recognition (1997), Video Registration (2003), and Automated Multi-Camera Surveillance: Algorithms and Practice (2008)), all by Springer. He has published ten book chapters, seventy five journal and one hundred seventy conference papers on topics related to visual surveillance, tracking, human activity and action recognition, object detection and categorization, shape from shading, geo registration, photo realistic synthesis, visual crowd analysis, bio medical imaging, etc.

Dr. Shah is a fellow of IEEE, IAPR and SPIE. In 2006, he was awarded a Pegasus Professor award, the highest award at UCF, given to a faculty member who has made a significant impact on the university, has made an extraordinary contribution to the university community, and has demonstrated excellence in teaching, research and service. He is a Distinguished ACM Speaker. He was an IEEE Distinguished Visitor speaker for 1997-2000, and received IEEE Outstanding Engineering Educator Award in 1997. He received the Harris Corporation's Engineering Achievement Award in 1999, the TOKTEN awards from UNDP in 1995, 1997, and 2000; Teaching Incentive Program awards in 1995 and 2003, Research Incentive Award in 2003, Millionaires' Club awards in 2005 and 2006, University Distinguished Researcher award in 2007, SANA award in 2007, an honorable mention for the ICCV 2005 Where Am I? Challenge Problem, and was nominated for the best paper award in ACM Multimedia Conference in 2005. He is an editor of international book series on Video Computing; editor in chief of Machine Vision and Applications journal, and an associate editor of ACM Computing Surveys journal. He was an associate editor of the IEEE Transactions on PAMI, and a guest editor of the special issue of International Journal of Computer Vision on Video Computing. He is the program co-chair of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2008.

Monday, December 21, 2009

Lab Meeting Dez. 23rd 09 (Andi): Shape-based Recognition of 3D Point Clouds in Urban Environments

Authors: Aleksey Golovinskiy, Vladimir G. Kim, Thomas Funkhouser

International Conference on Computer Vision (ICCV), September 2009

This paper investigates the design of a system for recognizing objects in 3D point clouds of urban environments. The system is decomposed into four steps: locating, segmenting, characterizing, and classifying clusters of 3D points. Specifically, we first cluster nearby points to form a set of potential object locations (with hierarchical clustering). Then, we segment points near those locations into foreground and background sets (with a graph-cut algorithm). Next, we build a feature vector for each point cluster (based on both its shape and its context). Finally, we label the feature vectors using a classifier trained on a set of manually labeled objects. The paper presents several alternative methods for each step. We quantitatively evaluate the system and tradeoffs of different alternatives in a truthed part of a scan of Ottawa that contains approximately 100 million points and 1000 objects of interest. Then, we use this truth data as a training set to recognize objects amidst approximately 1 billion points of the remainder of the Ottawa scan.

full Paper

also:
Min-Cut Based Segmentation of Point Clouds
Aleksey Golovinskiy and Thomas Funkhouser
IEEE Workshop on Search in 3D and Video (S3DV) at ICCV, September 2009, Kyoto

Sunday, December 20, 2009

Lab Meeting December 23rd, 2009 (Shao-Chen): Multi-robot SLAM with Unknown Initial Correspondence: The Robot Rendezvous Case

Title: Multi-robot SLAM with Unknown Initial Correspondence: The Robot Rendezvous Case (IROS 2006)

Authors: Xun S. Zhou and Stergios I. Roumeliotis

Abstract:

This paper presents a new approach to the multi-

robot map-alignment problem that enables teams of robots to

build joint maps without initial knowledge of their relative poses.

The key contribution of this work is an optimal algorithm for

merging (not necessarily overlapping) maps that are created

by different robots independently. Relative pose measurements

between pairs of robots are processed to compute the coordinate

transformation between any two maps. Noise in the robot-

to-robot observations, propagated through the map-alignment

process, increases the error in the position estimates of the

transformed landmarks, and reduces the overall accuracy of

the merged map. When there is overlap between the two maps,

landmarks that appear twice provide additional information, in

the form of constraints, which increases the alignment accuracy.

Landmark duplicates are identiﬁed through a fast nearest-

neighbor matching algorithm. In order to reduce the compu-

tational complexity of this search process, a kd-tree is used

to represent the landmarks in the original map. The criterion

employed for matching any two landmarks is the Mahalanobis

distance. As a means of validation, we present experimental

results obtained from two robots mapping an area of 4,800 m

[Link]

Tuesday, December 15, 2009

PhD Thesis Defense: Rhythmic Human-Robot Social Interaction

Marek P. Michalowski
Carnegie Mellon University
December 21, 2009, 10:00 a.m., NSH 3305

Abstract

Social scientists have identified and begun to describe rhythmic and synchronous properties of human social interaction. However, social interactions with robots are often stilted due to temporal mismatch between the behaviors, both verbal and nonverbal, of the interacting partners. This thesis brings the theory of interactional synchrony to bear on the design of social robots with a proposed architecture for rhythmic intelligence. We have developed technology that allows a robot to perceive social rhythms and to behave rhythmically. We have facilitated constrained social interactions, and designed experimental protocols, in which a robot variably synchronizes to human and/or environmental rhythms -- first in a dance-oriented task, and second in a cooperative video game. We have analyzed these interactions to understand the effects of Keepon's rhythmic attention on human performance. This thesis demonstrates that variations in a robot's rhythmic behavior have measurable effect on human rhythmic behavior and on performance in rhythmic tasks. Furthermore, human participants were able to assume and transition between the roles of leader or follower in these tasks.

Thesis Committee

Reid Simmons, Chair
Illah Nourbakhsh
Jodi Forlizzi
Hideki Kozima, Miyagi University, Japan

[link] [thesis draft]

Lab Meeting December 16th, 2009 (Casey): Monocular Vision SLAM for INdoor Aerial Vehicles

Title: Monocular Vision SLAM for INdoor Aerial Vehicles (IROS 2009)

Authors: Koray Celik, Soon-Jo Chung, Matthew Clausman, and Arun K. Somani

Abstract:

This paper presents a novel indoor navigation and ranging strategy by using a monocular camera. The proposed algorithms are integrated with simultaneous localization and mapping(SLAM) with a focus on indoor aerial vehicle applications. We experimentally validate the proposed algorithms by using a fully self-contained micro aerial vehicle (MAV) with on-board image processing and SLAM capabilities. The range measurement strategy is inspired by the key adaptive mechanisms for depth perception and pattern recognition found in humans and intelligent animals. The navigation strategy assumes an unknown, GPS-denied environment, which is representable via corner-like feature points and straight architectural lines. Experimental results show that the system is only limited by the capabilities of the camera and the availability of good corners.

[Link]

Monday, December 14, 2009

Lab Meeting December 16th, 2009 (Jeff): On measuring the accuracy of SLAM algorithms

Title: On measuring the accuracy of SLAM algorithms

Authors: Rainer Kümmerle, Bastian Steder, Christian Dornhege, Michael Ruhnke, Giorgio Grisetti, Cyrill Stachniss and Alexander Kleiner

Abstract:

In this paper, we address the problem of creating an objective benchmark for evaluating SLAM approaches. We propose a framework for analyzing the results of a SLAM approach based on a metric for measuring the error of the corrected trajectory. This metric uses only relative relations between poses and does not rely on a global reference frame. This overcomes serious shortcomings of approaches using a global reference frame to compute the error. Our method furthermore allows us to compare SLAM approaches that use different estimation techniques or different sensor modalities since all computations are made based on the corrected trajectory of the robot.
We provide sets of relative relations needed to compute our metric for an extensive set of datasets frequently used in the robotics community. The relations have been obtained by manually matching laser-range observations to avoid the errors caused by matching algorithms. Our benchmark framework allows the user to easily analyze and objectively compare different SLAM approaches.

Link:
Auton Robot 2009 27:387-407
http://www.springerlink.com/content/5u7458rl080216vr/fulltext.pdf

Wednesday, December 09, 2009

CMU Talk: Corridor View: Making Indoor Life Easier with Large Image Database

CMU VASC Seminar
Monday, Dec 7, 2009
1:30pm-2:30pm
NSH 1507

Corridor View: Making Indoor Life Easier with Large Image Database
Hongwen "Henry" Kang
Ph.D. Student, Robotics

Abstract:

Indoor environment poses substantial challenges for Computer Vision algorithms, due to the combined patterns that are either highly repetitive (e.g. doors), textureless (e.g. white walls), or temporally changing (e.g. posters, pedestrians). The fundamental challenge we want to tackle is the robust image matching. We proposed two approaches to address this problem, one is an iterative algorithm that combines global/local weighting strategies under bag-of-features model, the other data-mines distinctive feature vectors and uses high dimensional features directly for image matching, without quantization. Both of the approaches demonstrate significant improvements compared to straightforward image retrieval approaches, in highly confusing indoor environment. The proposed image matching techniques have broad applications. We selectively demonstrate two of them for this talk, specifically for vision impaired users living in the office environments. One application is data-driven zoomin; the other application is image composition for object pop-out.