This Blog is maintained by the Robot Perception and Learning lab at CSIE, NTU, Taiwan. Our scientific interests are driven by the desire to build intelligent robots and computers, which are capable of servicing people more efficiently than equivalent manned systems in a wide variety of dynamic and unstructured environments.
Sunday, December 26, 2010
Lab Meeting January 3rd, 2011(David) :Vision-Based Behavior Prediction in Urban Traffic Environments by Scene Categorization (BMVC 2010)
Authors: Martin Heracles, Fernando Martinelli and Jannik Fritsch
Abstract:
We propose a method for vision-based scene understanding in urban traffic environments that predicts the appropriate behavior of a human driver in a given visual scene. The method relies on a decomposition of the visual scene into its constituent objects by image segmentation and uses segmentation-based features that represent both their identity and spatial properties. We show how the behavior prediction can be naturally formulated as scene categorization problem and how ground truth behavior data for learning a classifier can be automatically generated from any monocular video sequence recorded from a moving vehicle, using structure from motion techniques. We evaluate our method both quantitatively and qualitatively on the recently proposed CamVid dataset, predicting the appropriate velocity and yaw rate of the car as well as their appropriate change for both day and dusk sequences. In particular, we investigate the impact of the underlying segmentation and the number of behavior classes on the quality of these predictions
link
Wednesday, December 22, 2010
Lab Meeting December 27, 2010(Chih Chung) : Lozano-Perez. Belief space planning assuming maximum likelihood observations.(RSS 2010)
Authors:Robert Platt Jr., Russ Tedrake, Leslie Kaelbling, Tomas Lozano-Perez
Abstract:
We cast the partially observable control problem as
a fully observable underactuated stochastic control problem in
belief space and apply standard planning and control techniques.
One of the difficulties of belief space planning is modeling the
stochastic dynamics resulting from unknown future observations.
The core of our proposal is to define deterministic beliefsystem
dynamics based on an assumption that the maximum
likelihood observation (calculated just prior to the observation)
is always obtained. The stochastic effects of future observations
are modelled as Gaussian noise. Given this model of the dynamics,
two planning and control methods are applied. In the first, linear
quadratic regulation (LQR) is applied to generate policies in the
belief space. This approach is shown to be optimal for linear-
Gaussian systems. In the second, a planner is used to find locally
optimal plans in the belief space. We propose a replanning
approach that is shown to converge to the belief space goal
in a finite number of replanning steps. These approaches are
characterized in the context of a simple nonlinear manipulation
problem where a planar robot simultaneously locates and grasps
an object.
link
Sunday, December 19, 2010
Lab Meeting December 20, 2010(Chung-Han): progress report
Sunday, December 12, 2010
Lab Meeting December 13, 2010(ShaoChen): DDF-SAM: Fully Distributed SLAM using Constrained Factor Graphs(IROS2010)
Authors: Alexander Cunningham, Manohar Paluri, and Frank Dellaert
Abstract:
We address the problem of multi-robot distributed SLAM with an extended Smoothing and Mapping (SAM) approach to implement Decentralized Data Fusion (DDF). We present DDF-SAM, a novel method for efficiently and robustly distributing map information across a team of robots, to achieve scalability in computational cost and in communication bandwidth and robustness to node failure and to changes in network topology. DDF-SAM consists of three modules: (1) a local optimization module to execute single-robot SAM and condense the local graph; (2) a communication module to collect and propagate condensed local graphs to other robots, and (3) a neighborhood graph optimizer module to combine local graphs into maps describing the neighborhood of a robot. We demonstrate scalability and robustness through a simulated example, in which inference is consistently faster than a comparable naive approach.
[link]
Monday, December 06, 2010
Lab Meeting December 6th, 2010(Nicole): Acoustic Source Localization and Tracking Using Track Before Detect
(IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, 2010)
Authors: Maurice F. Fallon, Simon Godsill
Abstract:
Particle Filter-based Acoustic Source Localization algorithms attempt to track the position of a sound source—one or more people speaking in a room—based on the current data from a microphone array as well as all previous data up to that point. This paper first discusses some of the inherent behavioral traits of the steered beamformer localization function. Using conclusions drawn from that study, a multitarget methodology for acoustic source tracking based on the Track Before Detect (TBD) framework is introduced. The algorithm also implicitly evaluates source activity using a variable appended to the state vector. Using the TBD methodology avoids the need to identify a set of source measurements and also allows for a vast increase in the number of particles used for a comparitive computational load which results in increased tracking stability in challenging recording environments. An evaluation of tracking performance is given using a set of real speech recordings with two simultaneously active speech sources.
[link]
Lab Meeting December 6th, 2010(KuoHuei): progress report
Sunday, November 28, 2010
Lab Meeting November 29, 2010 (Wang Li): Adaptive Pose Priors for Pictorial Structures (CVPR 2010)
Benjamin Sapp
Chris Jordan
Ben Taskar
Abstract
The structure and parameterization of a pictorial structure model is often restricted by assuming tree dependency structure and unimodal, data-independent pairwise interactions, which fail to capture important patterns in the data. On the other hand, local methods such as kernel density estimation provide nonparametric flexibility but require large amounts of data to generalize well. We propose a simple semi-parametric approach that combines the tractability of pictorial structure inference with the flexibility of non-parametric methods by expressing a subset of model parameters as kernel regression estimates from a learned sparse set of exemplars. This yields query-specific, image-dependent pose priors. We develop an effective shape-based kernel for upper-body pose similarity and propose a leave-one-out loss function for learning a sparse subset of exemplars for kernel regression. We apply our techniques to two challenging datasets of human figure parsing and advance the state-of-the-art (from 80% to 86% on the Buffy dataset), while using only 15% of the training data as exemplars.
Paper Link
Saturday, November 27, 2010
Lab Meeting November 29th, 2010 (Jeff): Sub-Meter Indoor Localization in Unmodified Environments with Inexpensive Sensors
Authors: Morgan Quigley, David Stavens, Adam Coates, and Sebastian Thrun
Abstract:
The interpretation of uncertain sensor streams for localization is usually considered in the context of a robot. Increasingly, however, portable consumer electronic devices, such as smartphones, are equipped with sensors including WiFi radios, cameras, and inertial measurement units (IMUs). Many tasks typically associated with robots, such as localization, would be valuable to perform on such devices. In this paper, we present an approach for indoor localization exclusively using the low-cost sensors typically found on smartphones. Environment modification is not needed. We rigorously evaluate our method using ground truth acquired using a laser range scanner. Our evaluation includes overall accuracy and a comparison of the contribution of individual sensors. We find experimentally that fusion of multiple sensor modalities is necessary for optimal performance and demonstrate sub-meter localization accuracy.
Link:
IEEE/RSJ International Conference on Intelligent Robots and Systems(IROS), October 2010
http://www-cs.stanford.edu/people/dstavens/iros10/quigley_etal_iros10.pdf
or
local_copy
Video:
http://www.cs.stanford.edu/people/dstavens/iros10/quigley_etal_iros10.mp4
Monday, November 22, 2010
Lab Meeting November 22, 2010 (Andi): Three-Dimensional Mapping with Time-of-Flight Cameras
Sunday, November 21, 2010
Lab Meeting November 22, 2010 (Alan): Temporary Maps for Robust Localization in Semi-static Environments (IROS 2010)
Monday, November 15, 2010
Lab Meeting November 15( KuenHan ), 3D Reconstruction of a Moving Point from a Series of 2D Projections (ECCV 2010)
Author: Hyun Soo Park, Takaaki Shiratori, Iain Matthews, and Yaser Sheikh
Abstract
This paper presents a linear solution for reconstructing the 3D trajectory of a moving point from its correspondence in a collection of 2D perspective images, given the 3D spatial pose and time of capture of the cameras that produced each image. Triangulation-based solutions do not apply, as multiple views of the point may not exist at each instant in time. A geometric analysis of the problem is presented and a criterion, called reconstructibility, is defined to precisely characterize the cases when reconstruction is possible, and how accurate it can be. We apply the linear reconstruction algorithm to reconstruct the time evolving 3D structure of several real-world scenes, given a collection of non-coincidental 2D images.
LinkSunday, November 14, 2010
Lab Meeting November 15, 2010 (fish60): Unfreezing the Robot: Navigation in Dense, Interacting Crowds
Author: Peter Trautman and Andreas Krause
Abstract—In this paper, we study the safe navigation of a mobile robot through crowds of dynamic agents with uncertain trajectories. Existing algorithms suffer from the “freezing robot” problem: once the environment surpasses a certain level of complexity, the planner decides that all forward paths are unsafe, and the robot freezes in place (or performs unnecessary aneuvers) to avoid collisions. ... In this work, we demonstrate that both the individual prediction and the predictive uncertainty have little to do with the frozen robot problem. Our key insight is that dynamic agents solve the frozen robot problem by engaging in “joint collision avoidance”: They cooperatively make room to create feasible trajectories. We develop IGP, a nonparametric statistical model based on Dependent Output Gaussian Processes that can estimate crowd interaction from data. Our model naturally captures the non-Markov nature of agent trajectories, as well as their goal-driven navigation. We then show how planning in this model can be efficiently implemented using particle based inference.
Link
Monday, November 01, 2010
CMU PhD Thesis Defense: Geolocation with Range: Robustness, Efficiency and Scalability
Joseph A. Djugash
Geolocation with Range: Robustness, Efficiency and Scalability
November 05, 2010, 10:00 a.m., NSH 1507
Abstract
This thesis explores the topic of geolocation with range. A robust method for localization and SLAM (Simultaneous Localization and Mapping) is proposed. This method uses a polar parameterization of the state to achieve accurate estimates of the nonlinear and multi-modal distributions in range-only systems. Several experimental evaluations on real robots reveal the reliability of this method.
Scaling such a system to large network of nodes, increases the computational load on the system due to the increased state vector. To alleviate this problem, we propose the use of a distributed estimation algorithm based on the belief propagation framework. This method distributes the estimation task, such that each node only estimates its local network, greatly reducing the computation performed by any individual node. However, the method does not provide any guarantees on the convergence of its solution in general graphs. Convergence is only guaranteed for non-cyclic graphs (ie. trees). Thus, an extension of this approach which reduces any arbitrary graph to a spanning tree is presented. This enables the proposed decentralized localization method to provide guarantees on its convergence.
[LINK][PDF]
Thesis Committee
Sanjiv Singh, Chair
George Kantor
Howie Choset
Wolfram Burgard, University of Freiburg
Sunday, October 31, 2010
Lab Meeting November 1, 2010 (Will): Visual Event Recognition in Videos by Learning from Web Data (CVPR 2010)
Author: Lixin Duan, Dong Xu, Ivor W. Tsang, Jiebo Luo
Abstract:
We propose a visual event recognition framework for consumer domain videos by leveraging a large amount of loosely labeled web videos (e.g., from YouTube). First, we propose a new aligned space-time pyramid matching method to measure the distances between two video clips, where each video clip is divided into space-time volumes over multiple levels. We calculate the pairwise distances between any two volumes and further integrate the information from different volumes with Integer-flow Earth Mover’s Distance (EMD) to explicitly align the volumes. Second, we propose a new cross-domain learning method in order to 1) fuse the information from multiple pyramid levels and features (i.e., space-time feature and static SIFT feature) and 2) cope with the considerable variation in feature dis- tributions between videos from two domains (i.e., web do- main and consumer domain). For each pyramid level and each type of local features, we train a set of SVM classifiers based on the combined training set from two domains using multiple base kernels of different kernel types and parameters, which are fused with equal weights to obtain an average classifier. Finally, we propose a cross-domain learning method, referred to as Adaptive Multiple Kernel Learning (A-MKL), to learn an adapted classifier based on multiple base kernels and the prelearned average classifiers by minimizing both the structural risk functional and the mismatch between data distributions from two domains. Extensive experiments demonstrate the effectiveness of our proposed framework that requires only a small number of labeled consumer videos by leveraging web data.
Friday, October 29, 2010
Lab meeting Nov. 01 2010, (Chih-Chung) POMDPs for robotic tasks with mixed observability (RSS 2009)
Author:Sylvie C.W.Ong, Shao Wei Png, David Hsu and Wee Sun Lee.
Abstract:
Partially observable Markov decision processes
(POMDPs) provide a principled mathematical framework for
motion planning of autonomous robots in uncertain and dynamic
environments. They have been successfully applied to
various robotic tasks, but a major challenge is to scale up
POMDP algorithms for more complex robotic systems. Robotic
systems often have mixed observability: even when a robot’s
state is not fully observable, some components of the state
may still be fully observable. Exploiting this, we use a factored
model to represent separately the fully and partially observable
components of a robot’s state and derive a compact lowerdimensional
representation of its belief space. We then use this
factored representation in conjunction with a point-based algorithm
to compute approximate POMDP solutions. Separating
fully and partially observable state components using a factored
model opens up several opportunities to improve the efficiency
of point-based POMDP algorithms. Experiments show that on
standard test problems, our new algorithm is many times faster
than a leading point-based POMDP algorithm.
Thursday, October 28, 2010
News: University of Chicago, Cornell Researchers Develop Universal Robotic Gripper
Robotic hands are usually just that -- hands -- but some researchers from the University of Chicago and Cornell University (with a little help from iRobot) have taken a decidedly different approach for their so-called universal robotic gripper. As you can see above, the gripper is actually a balloon that can conform to and grip just about any small object, and hang onto it firmly enough to pick it up. What's the secret? After much testing, the researchers found that ground coffee was the best substance to fill the balloon with -- to grab an object, the gripper simply creates a vacuum in the balloon (much like a vacuum-sealed bag of coffee), and it's then able to let go of the object just by releasing the vacuum. Simple, but it works. Head on past the break to check it out in action. [via engadget]
Monday, October 25, 2010
Lab meeting Oct. 25 2010, (David) Threat-aware Path Planning in Uncertain Urban Environments (IROS 2010)
Authors: Georges S. Aoude, Brandon D. Luders, Daniel S. Levine, and Jonathan P. How
Abstract:
This paper considers the path planning problem
for an autonomous vehicle in an urban environment populated
with static obstacles and moving vehicles with uncertain intents.
We propose a novel threat assessment module, consisting of
an intention predictor and a threat assessor, which augments
the host vehicle’s path planner with a real-time threat value
representing the risks posed by the estimated intentions of
other vehicles. This new threat-aware planning approach is
applied to the CL-RRT path planning framework, used by the
MIT team in the 2007 DARPA Grand Challenge. The strengths
of this approach are demonstrated through simulation and
experiments performed in the RAVEN testbed facilities
[local copy]
[link ]
[local video]
[video]
Monday, October 11, 2010
Lab meeting Oct. 11 2010, (Shao-Chen) Consistent data association in multi-robot systems with limited communications(RSS 2010)
Authors: Rosario Aragues,Eduardo Montijano, and Carlos Sagues
Abstract:
In this paper we address the data association
problem of features observed by a robot team with limited communications.
At every time instant, each robot can only exchange
data with a subset of the robots, its neighbors. Initially, each
robot solves a local data association with each of its neighbors.
After that, the robots execute the proposed algorithm to agree
on a data association between all their local observations which
is globally consistent. One inconsistency appears when chains of
local associations give rise to two features from one robot being
associated among them. The contribution of this work is the
decentralized detection and resolution of these inconsistencies.
We provide a fully decentralized solution to the problem. This
solution does not rely on any particular communication topology.
Every robot plays the same role, making the system robust to
individual failures. Information is exchanged exclusively between
neighbors. In a finite number of iterations, the algorithm finishes
with a data association which is free of inconsistent associations.
In the experiments, we show the performance of the algorithm
under two scenarios. In the first one, we apply the resolution
and detection algorithm for a set of stochastic visual maps. In
the second, we solve the feature matching between a set of images
taken by a robotic team.
[link]
Lab meeting Oct. 11th 2010, (Nicole) Improvement in Listening Capability for Humanoid Robot HRP-2(ICRA 2010)
Title: Improvement in Listening Capability for Humanoid Robot HRP-2 (ICRA2010)
Authors: Toru Takahashi, Kazuhiro Nakadai, Kazunori Komatani, Tetsuya Ogata and Hiroshi G. Okuno.
Abstract:
This paper describes improvement of sound source separation for a simultaneous automatic speech recognition (ASR) system of a humanoid robot. A recognition error in the system is caused by a separation error and interferences of other sources. In separability, an original geometric source separation (GSS) is improved. Our GSS uses a measured robot’s head related transfer function (HRTF) to estimate a separation matrix. As an original GSS uses a simulated HRTF calculated based on a distance between microphone and sound source, there is a large mismatch between the simulated and the measured transfer functions. The mismatch causes a severe degradation of recognition performance.
Faster convergence speed of separation matrix reduces separation error. Our approach gives a nearer initial separation matrix based on a measured transfer function from an optimal separation matrix than a simulated one. As a result, we expect that our GSS improves the convergence speed. Our GSS is also able to handle an adaptive step-size parameter.
These new features are added into open source robot audition software (OSS) called”HARK” which is newly updated as version 1.0.0. The HARK has been installed on a HRP-2 humanoid with an 8-element microphone array. The listening capability of HRP-2 is evaluated by recognizing a target speech signal which is separated from a simultaneous speech signal by three talkers. The word correct rate (WCR) of ASR improves by 5 points under normal acoustic environments and by 10 points under noisy environments. Experimental results show that HARK 1.0.0 improves the robustness against noises.
Lab meeting Oct. 11th 2010, (Andi) Dynamic 3D Scene Analysis for Acquiring Articulated Scene Models
Sunday, October 10, 2010
News: Google's Self-Driving Cars
By Sebastian Thrun, Google
Read the full article.
Google Cars Drive Themselves, in Traffic
By John Markoff, The New York Times
Read the full article.
Friday, October 08, 2010
News: MIT Media Lab Medical Mirror
Sunday, October 03, 2010
Lab Meeting October 4th, 2010 (Jeff): Progress Report
Lab Meeting October 4th, 2010(KuoHuei): progress report
Monday, September 27, 2010
Lab Meeting September 27, 2010 (Wang Li): Monocular 3D Pose Estimation and Tracking by Detection (CVPR 2010)
Mykhaylo Andriluka
Stefan Roth
Bernt Schiele
Abstract
Automatic recovery of 3D human pose from monocular
image sequences is a challenging and important research
topic with numerous applications. Although current methods
are able to recover 3D pose for a single person in controlled
environments, they are severely challenged by realworld
scenarios, such as crowded street scenes. To address
this problem, we propose a three-stage process building on
a number of recent advances. The first stage obtains an initial
estimate of the 2D articulation and viewpoint of the person
from single frames. The second stage allows early data
association across frames based on tracking-by-detection. The third and
final stage uses those tracklet-based estimates as robust image
observations to reliably recover 3D pose. We demonstrate
state-of-the-art performance on the HumanEva II
benchmark, and also show the applicability of our approach
to articulated 3D tracking in realistic street conditions.
Paper Link
Sunday, September 19, 2010
Lab Meeting September 20, 2010 (Kuen-Han): Scale Drift-Aware Large Scale Monocular SLAM (RSS 2010)
Author: Hauke Strasdat, J.M.M. Montiel, Andrew J. Davison
Abstract—State of the art visual SLAM systems have recently
been presented which are capable of accurate, large-scale and
real-time performance, but most of these require stereo vision.
Important application areas in robotics and beyond open up
if similar performance can be demonstrated using monocular
vision, since a single camera will always be cheaper, more
compact and easier to calibrate than a multi-camera rig.
With high quality estimation, a single camera moving through
a static scene of course effectively provides its own stereo
geometry via frames distributed over time. However, a classic
issue with monocular visual SLAM is that due to the purely
projective nature of a single camera, motion estimates and map
structure can only be recovered up to scale. Without the known
inter-camera distance of a stereo rig to serve as an anchor, the
scale of locally constructed map portions and the corresponding
motion estimates is therefore liable to drift over time.
In this paper we describe a new near real-time visual SLAM
system which adopts the continuous keyframe optimisation approach
of the best current stereo systems, but accounts for
the additional challenges presented by monocular input. In
particular, we present a new pose-graph optimisation technique
which allows for the efficient correction of rotation, translation
and scale drift at loop closures. Especially, we describe the
Lie group of similarity transformations and its relation to the
corresponding Lie algebra. We also present in detail the system’s
new image processing front-end which is able accurately to track
hundreds of features per frame, and a filter-based approach
for feature initialisation within keyframe-based SLAM. Our
approach is proven via large-scale simulation and real-world
experiments where a camera completes large looped trajectories.
link
Lab Meeting September 20, 2010 (Alan): Probabilistic Surveillance with Multiple Active Cameras (ICRA 2010)
Saturday, September 18, 2010
Monday, September 13, 2010
Lab Meeting September 13th, 2010(fish60): progress report
Saturday, September 11, 2010
Lab Meeting September 13th, 2010(Gary): AAM based Face Tracking with Temporal Matching and Face Segmentation(CVPR 2010)
AAM based Face Tracking with Temporal Matching and Face Segmentation
Authors:
Mingcai Zhou, Lin Liang, Jian Sun, Yangsheng Wang
Abstract:
Active Appearance Model (AAM) based face tracking has
advantages of accurate alignment, high efficiency, and
effectiveness for handling face deformation. However, AAM
suffers from the generalization problem and has difficulties
in images with cluttered backgrounds. In this paper, we in-
troduce two novel constraints into AAM fitting to address
the above problems. We first introduce a temporal matching
constraint in AAM fitting. In the proposed fitting scheme,
the temporal matching enforces an inter-frame local ap-
pearance constraint between frames. The resulting model
takes advantage of temporal matching's good generalizabil-
ity, but does not suffer from the mismatched points. To make
AAM more stable for cluttered backgrounds, we introduce a
color-based face segmentation as a soft constraint. Both
constraints effectively improve the AAM tracker's perfor-
mance, as demonstrated with experiments on various chal-
lenging real-world videos.
link
Wednesday, September 08, 2010
PhD Thesis Defense: David Silver [Learning Preference Models for Autonomous Mobile Robots in Complex Domains]
Learning Preference Models for Autonomous Mobile Robots in Complex Domains
Carnegie Mellon University
September 13, 2010, 12:30 p.m., NSH 1507
Abstract
Achieving robust and reliable autonomous operation even in complex unstructured environments is a central goal of field robotics. ...
This thesis presents the development and application of machine learning techniques that automate the construction and tuning of preference models within complex mobile robotic systems. Utilizing the framework of inverse optimal control, expert examples of robot behavior can be used to construct models that generalize demonstrated preferences and reproduce similar behavior. Novel learning from demonstration approaches are developed that offer the possibility of significantly reducing the amount of human interaction necessary to tune a system, while also improving its final performance. Techniques to account for the inevitability of noisy and imperfect demonstration are presented, along with additional methods for improving the efficiency of expert demonstration and feedback.
The effectiveness of these approaches is confirmed through application to several real world domains, such as the interpretation of static and dynamic perceptual data in unstructured environments and the learning of human driving styles and maneuver preferences. ... These experiments validate the potential applicability of the developed algorithms to a large variety of future mobile robotic systems.
Link
Monday, September 06, 2010
Lab Meeting September 7th, 2010 (Jimmy): Learning to Recognize Objects from Unseen Modalities
In ECCV2010
Authors: C. Mario Christoudias, Raquel Urtasun, Mathieu Salzmann and Trevor Darrell
Abstract
In this paper we investigate the problem of exploiting multiple sources of information for object recognition tasks when additional modalities that are not present in the labeled training set are available for inference. This scenario is common to many robotics sensing applications and is in contrast with the assumption made by existing approaches that require at least some labeled examples for each modality. To leverage the previously unseen features, we make use of the unlabeled data to learn a mapping from the existing modalities to the new ones. This allows us to predict the missing data for the labeled examples and exploit all modalities using multiple kernel learning. We demonstrate the e ectiveness of our approach on several multi-modal tasks including object recognition from multi-resolution imagery, grayscale and color images, as well as images and text. Our approach outperforms multiple kernel learning on the original modalities, as well as nearest-neighbor and bootstrapping schemes.
[pdf]
Sunday, September 05, 2010
Lab Meeting September 7th, 2010 (Will(柏崴)): Efficient Computation of Robust Low-Rank Matrix Approximations in the Presence of Missing Data using the L1 Norm (CVPR2010)
Saturday, August 28, 2010
Lab Meeting August 31st, 2010 (zhi-zhong(執中)): Efficient Planning under Uncertainty for a Target-Tracking Micro-Aerial Vehicle (ICRA'10)
Lab Meeting August 31st, 2010 (David): Scene Understanding in a Large Dynamic Environment through a Laser-based Sensing (ICRA'10)
Monday, August 23, 2010
Lab Meeting August 23rd, 2010 (Nicole): Evaluating Real-time Audio Localization Algorithms for Artificial Audition in Robotics (IROS'09)
Authors: Anthony Badali,Jean-Marc Valin,Francois Michaud,and Parham Aarabi
Abstract:
Although research on localization of sound sources using microphone arrays has been carried out for years, providing such capabilities on robots is rather new. Artificial audition systems on robots currently exist, but no evaluation of the methods used to localize sound sources has yet been conducted. This paper presents an evaluation of various real-time audio localization algorithms using a medium-sized micro-phone array which is suitable for applications in robotics. Thetechniques studied here are implementations and enhancements of steered response power - phase transform beamformers, which represent the most popular methods for time difference of arrival audio localization. In addition, two different grid topologies for implementing source direction search are also compared. Results show that a direction refinement procedure can be used to improve localization accuracy and that more efficient and accurate direction searches can be performed using a uniform triangular element grid rather than the typical rectangular element grid.
local copy : [link]
[link]
Lab Meeting August 23rd, 2010 (ShaoChen): Distributed Nonlinear Estimation for Robot Localization using Weighted Consensus (ICRA'10)
Authors: Andrea Simonetto, Tam´as Keviczky and Robert Babuˇska
Abstract:
Distributed linear estimation theory has received increased
attention in recent years due to several promising
industrial applications. Distributed nonlinear estimation, however
is still a relatively unexplored field despite the need in
numerous practical situations for techniques that can handle
nonlinearities. This paper presents a unified way of describing
distributed implementations of three commonly used nonlinear
estimators: the Extended Kalman Filter, the Unscented Kalman
Filter and the Particle Filter. Leveraging on the presented
framework, we propose new distributed versions of these
methods, in which the nonlinearities are locally managed by
the various sensors whereas the different estimates are merged
based on a weighted average consensus process. The proposed
versions are shown to outperform the few published ones in
two robot localization test cases.
[link]
Tuesday, August 10, 2010
Lab Meeting August 10th, 2010 (KuoHuel): An Online Approach: Learning-Semantic-Scene-by-Tracking and Tracking-by-Learning-Semantic-Scene (CVPR'10)
Monday, August 09, 2010
Lab Meeting August 10th, 2010 (Jeff): FAB-MAP + RatSLAM: Appearance-based SLAM for Multiple Times of Day
Authors: Arren J. Glover, William P. Maddern, Michael J. Milford, and Gordon F. Wyeth
Abstract:
Appearance-based mapping and localisation is especially challenging when separate processes of mapping and localisation occur at different times of day. The problem is exacerbated in the outdoors where continuous change in sun angle can drastically affect the appearance of a scene. We confront this challenge by fusing the probabilistic local feature based data association method of FAB-MAP with the pose cell filtering and experience mapping of RatSLAM. We evaluate the effectiveness of our amalgamation of methods using five datasets captured throughout the day from a single camera driven through a network of suburban streets. We show further results when the streets are re-visited three weeks later, and draw conclusions on the value of the system for lifelong mapping.
Link:
IEEE International Conference on Robotics and Automation(ICRA), May 2010
http://eprints.qut.edu.au/31569/1/c31569.pdf
or
local_copy
Wednesday, August 04, 2010
CVPR 2010 Awards
Best Student Paper
- Visual Event Recognition in Videos by Learning from Web Data: Lixin Duan, Dong Xu, Ivor Wai-Hung Tsang, and Jiebo Luo
Best Paper Honorable Mention
- Modeling Mutual Context of Object and Human Pose in Human-Object Interaction Activities: Bangpeng Yao and Li Fei-Fei
Best Paper
- Efficient Computation of Robust Low-Rank Matrix Approximations in the Presence of Missing Data using the L1 Norm: Anders Eriksson and Anton van den Hengel
Longuet-Higgins Prize
- Efficient Matching of Pictorial Structures: Pedro F. Felzenszwalb and Daniel P. Huttenlocher
- Real-Time Tracking of Non-Rigid Objects Using Mean Shift: Dorin Comaniciu, Visvanathan Ramesh, and Peter Meer
Monday, August 02, 2010
Lab Meeting August 3rd, 2010 (Wang Li): Modeling Mutual Context of Object and Human Pose in Human-Object Interaction Activities (CVPR 2010)
Bangpeng Yao
Li Fei-Fei
Abstract
Detecting objects in cluttered scenes and estimating articulated human body parts are two challenging problems in computer vision. We observe, however, that objects and human poses can serve as mutual context to each other – recognizing one facilitates the recognition of the other.
In this paper, we propose a new random field model to encode the mutual context of objects and human poses in human-object interaction activities. We then cast the model learning task as a structure learning problem, of which the structural connectivity between the object, the overall human pose and different body parts are estimated through a structure search approach, and the parameters of the model are estimated by a new max-margin algorithm.
On a sports data set of six classes of human-object interactions, we show that our mutual context model significantly outperforms state-of-the-art in detecting very difficult objects and human poses.
Paper Link
Thursday, July 29, 2010
Lab Meeting 8 / 3, 2010 (Alan) - Mapping Indoor Environments Based on Human Activity (ICRA 2010)
Monday, July 26, 2010
Lab Meeting 07/27, 2010(Kuen-Han) Non-Rigid Structure from Locally-Rigid Motion (CVPR,2010)
Authors: Jonathan Taylor Allan D. Jepson Kiriakos N. Kutulakos
Abstract:
We introduce locally-rigid motion, a general framework for
solving the M-point, N-view structure-from-motion problem
for unknown bodies deforming under orthography. The
key idea is to first solve many local 3-point, N-view rigid
problems independently, providing a “soup” of specific,
plausibly rigid, 3D triangles. The main advantage here is
that the extraction of 3D triangles requires only very weak
assumptions: (1) deformations can be locally approximated
by near-rigid motion of three points (i.e., stretching not
dominant) and (2) local motions involve some generic rotation
in depth. Triangles from this soup are then grouped
into bodies, and their depth flips and instantaneous relative
depths are determined. Results on several sequences,
both our own and from related work, suggest these conditions
apply in diverse settings—including very challenging
ones (e.g., multiple deforming bodies). Our starting point
is a novel linear solution to 3-point structure from motion,
a problem for which no general algorithms currently exist.
paper
Saturday, July 24, 2010
Lab Meeting July 20, 2010 (fish60): What if the Irresponsible Teachers Are Dominating? A Method of Training on Samples and Clustering on Teachers
Here's the content:
Title:
What if the Irresponsible Teachers Are Dominating? A Method of Training on Samples and Clustering on Teachers
Authors:
Shuo Chen, Jianwen Zhang, Guangyun Chen, Changshui Zhang
State Key Laboratory on Intelligent Technology and Systems
Tsinghua National Laboratory for Information Science and Technology (TNList)
Department of Automation, Tsinghua University, Beijing 100084, China
Abstract:
Learning from multiple teachers or sources
has received more attention of the researchers in the machine
learning area. In this setting, the learning system is dealing
with samples and labels provided by multiple teachers, who
in common cases, are non-expert. Their labeling styles and
behaviors are usually diverse, some of which are even detrimental
to the learning system. Thus, simply putting them
together and utilizing the algorithms designed for singleteacher
scenario would be not only improper, but also damaging.
Our work focuses on a case where the teachers are composed of good
ones and irresponsible ones. By irresponsible, we mean the
teacher who takes the labeling task not seriously and label
the sample at random without inspecting the sample itself.
If we do not take out their effects, our learning system would be ruined with no
doubt. In this paper, we propose a method for picking out the
good teachers with promising experimental results. It works
even when the irresponsible teachers are dominating in numbers.
Link
Wednesday, July 21, 2010
江山代有才人出 攻讀博士—不輕言放棄
即使興趣不完全符合指導教授之研究領域,應考慮與其適時溝通、更換原指定之研究項目,甚至要求同時加入另一教授研究團隊,仍繼續跟定指導教授,不輕言放棄,或才是良策。
A生曾榮獲美國極頂尖大學某一指導教授〈Advisor〉給予的全額研究助理獎學金〈RA,Research Assistantship〉,一年半後,A生放棄學業,正在覓職中!
B生曾榮獲美國另一極頂尖大學給予的一年期全額研究生獎學金〈Graduate Fellowship〉,一年後及時拿到RA,卻一直認為研究與現實脫節,擔心未來就業機會而深感困擾!
這兩位高材生皆因成績優異,才能獲美國極頂尖大學給予的全額獎學金,卻皆因興趣不全在指導教授研究領域,而心蒙去念或煩惱不已!
其實,人生不如意十有八九,人生本就不完美,古今中外皆然。即使未來就職、自己創業或擔任教授,又怎能保證事事如意。已是天之驕子,若不能克服眼前困難,輕言放棄,今後何以立足於社會?因此,提供一些建議,學習如何解決問題,創造師生雙贏局面。
續跟指導教授
------------更換研究項目
指導教授常會同時進行數個研究項目,可當面請教並說明原因,是否能更換原指定之研究題目。若非不合理,教授多半都會接受。須知,博士論文〈PhD Dissertation〉大多由幾個研究專題組合而成。因此,最好是在文章被期刊或會議接受後提出;一來,可對目前該專題有所交代〈不至於浪費教授研究經費〉,二來也有助於自己博士論文的進展。再者,亦可利用這段時間對新研究項目有所了解。
假如研究題目太難以致研究上難已突破並發表文章時,更須謹言並舉例佐證,以免教授懷疑你的能力,對你失去信心。如純粹是興趣不足,最好是在老師指定題目數週內及時告知,以免留下不好印象。無論如何,必須同時向教授提出你的興趣所在,並告知你的背景及觀點,以便讓他對你能在此新研究專題上有所突破更具信心。
另加指導教授
-------------研究跨領域項目
對跨領域的新研究專題,可請求老師,是否能同時加入另一共同指導教授〈Co-Advisor〉的研究團隊。這項權宜之計,由於兩位教授可平均分擔研究經費,卻同時受惠於你未來的研究成果,因此指導教授通常會樂於接受。
換言之,不管個人興趣是否與指導教授研究領域相近,繼續跟定指導教授,不輕言放棄;且莫在博士資格考試〈PhD Qualifying Examination〉未通過前提出,以免造成輟學的嚴重後果。
經過長期溝通,最後A生接受指導教授建議,先休學、工作一段時間,再考慮是否繼續完成博士學位。B生則同時加入另一教授之研究團隊,不排除於畢業後往學術界發展;其後續已不再為研究課題而煩惱,並已在新覓研究領域之尖端會議中發表論文。由於處理得宜,目前這兩位高材生仍與原來指導教授保持良好關係。畢竟,恩師難覓,必須知福惜福;師生情難建,值得一生珍惜! 〈王榮騰 臺大電機系與電子工程研究所客座教授;2010年6月6日〉
Sunday, July 18, 2010
Lab Meeting July 20, 2010 (Gary): Robust Unified Stereo-Based 3D Head Tracking and Its Application to Face Recognition (ICRA2010)
Robust Unified Stereo-Based 3D Head Tracking and Its Application
to Face Recognition
Authors: Kwang Ho An and Myung Jin Chung
Abstract:
This paper investigates the estimation of 3D head poses and its identity authentication with a partial ellipsoid model. To cope with large out-of-plane rotations and translation in-depth, we extend conventional head tracking with a single camera to a stereo-based framework. To achieve more robust motion estimation even under time-varying lighting conditions, we incorporate illumination correction into the aforementioned framework. We approximate the face image variations due to illumination changes as a linear combination of illumination bases. Also,��by computing the illumination bases online from the registered face images, after estimating the 3D head poses, user-specific illumination bases can be obtained, and therefore illumination-robust tracking without a prior learning process can be possible. Furthermore, our unified stereo-based tracking is approximated as a linear least-squares problem; a closed-form solution is then provided. After recovering the full-motions of the head, we can register face images with pose variations into stabilized-view images, which are suitable for pose-robust face recognition. To verify the feasibility and applicability of our approach, we performed extensive experiments with three sets of challenging image sequences.
link
Thursday, July 15, 2010
Lab Meeting July 20, 2010 (Jimmy): Group-Sensitive Multiple Kernel Learning for Object Categorization
Authors: Jingjing Yang, Yuanning Li, Yonghong Tian, Lingyu Duan, Wen Gao
In: ICCV 2009
Abstract
In this paper, we propose a group-sensitive multiple kernel learning (GS-MKL) method to accommodate the intra-class diversity and the inter-class correlation for object categorization. By introducing an intermediate representation “group” between images and object categories, GS-MKL attempts to find appropriate kernel combination for each group to get a finer depiction of object categories. For each category, images within a group share a set of kernel weights while images from different groups may employ distinct sets of kernel weights. In GS-MKL, such group-sensitive kernel combinations together with the multi-kernels based classifier are optimized in a joint manner to seek a trade-off between capturing the diversity and keeping the invariance for each category. Extensive experiments show that our proposed GS-MKL method has achieved encouraging performance over three challenging datasets.
[pdf]
Monday, July 12, 2010
Lab Meeting July 13, 2010(ShaoChen):Rao-Blackwellized Particle Filters Multi Robot SLAM with Unknown Initial Correspondences and Limited Communication(ICRA 2010)
Authors: Luca Carlone, Miguel Kaouk Ng, Jingjing Du, Basilio Bona, and Marina Indri
Abstract:
Multi robot systems are envisioned to play an important role in many robotic applications. A main prerequisite for a team deployed in a wide unknown area is the capability of autonomously navigate, exploiting the information acquired through the on-line estimation of both robot poses
and surrounding environment model, according to Simultaneous Localization And Mapping (SLAM) framework. As team coordination is improved, distributed techniques for filtering
are required in order to enhance autonomous exploration and large scale SLAM increasing both efficiency and robustness of operation. Although Rao-Blackwellized Particle Filters (RBPF) have been demonstrated to be an effective solution to the problem of single robot SLAM, few extensions to teams of robots exist, and these approaches are characterized by strict assumptions on both communication bandwidth and prior knowledge on relative poses of the teammates. In the present paper we address the problem of multi robot SLAM in the case of limited communication and unknown relative initial poses. Starting from the well established single robot RBPFSLAM, we propose a simple technique which jointly estimates SLAM posterior of the robots by fusing the prioceptive and the eteroceptive information acquired by each teammate. The approach intrinsically reduces the amount of data to be exchanged among the robots, while taking into account the uncertainty in relative pose measurements. Moreover it can be naturally extended to different communication technologies (bluetooth, RFId, wifi, etc.) regardless their sensing range. The proposed approach is validated through experimental test.
[link]
Lab Meeting July 13,2010(Nicole):Mutual Localization in a Team of Autonomous Robots using Acoustic Robot Detection
Authors: David Becker and Max Risler
In RoboCup 2008: Robot Soccer World Cup XII ,Volume 5399/2009
Abstract
In order to improve self-localization accuracy we are exploring ways of mutual localization in a team of autonomous robots. Detecting team mates visually usually leads to inaccurate bearings and only rough distance estimates. Also, visually identifying teammates is not possible. Therefore we are investigating methods of gaining relative position information acoustically in a team of robots.
The technique introduced in this paper is a variant of code-multiplexed communication (CDMA, code division multiple access). In a CDMA system, several receivers and senders can communicate at the same time, using the same carrier frequency. Well-known examples of CDMA systems include wireless computer networks and the Global Positioning System, GPS. While these systems use electro-magnetic waves, we will try to adopt the CDMA principle towards using acoustic pattern recognition, enabling robots to calculate distances and bearings to each other.
First, we explain the general idea of cross-correlation functions and appropriate signal pattern generation. We will further explain the importance of synchronized clocks and discuss the problems arising from clock drifts.
Finally, we describe an implementation using the Aibo ERS-7 as platform and briefly state basic results, including measurement accuracy and a runtime estimate. We will briefly discuss acoustic localization in the specific scenario of a RoboCup soccer game.
[link]
Tuesday, July 06, 2010
Lab Meeting July 6th (Casey): Live Dense Reconstruction with a Single Moving Camera (CVPR 2010)
Monday, July 05, 2010
Lab Meeting July 6th 2010 (Andi): Upsampling Range Data in Dynamic Environments (CVPR 2010 )
Authors
Jennifer Dolson, Jongmin Baek, Christian Plagemann and Sebastian Thrun (Stanford University)
Abstract
We present a flexible method for fusing information from optical and range sensors based on an accelerated high-dimensional filtering approach. Our system takes as input a sequence of monocular camera images as well as a stream of sparse range measurements as obtained from a laser or other sensor system. In contrast with existing approaches, we do not assume that the depth and color data streams have the same data rates or that the observed scene is fully static. Our method produces a dense, high-resolution depth map of the scene, automatically generating confidence values for every interpolated depth point. We describe how to integrate priors on object shape, motion and appearance and how to achieve an efficient implementation using parallel processing hardware such as GPUs.
Monday, June 28, 2010
Lab Meeting June 29th, 2010 (KuoHuel): People Tracking with Human Motion Predictions from Social Forces (ICRA'10)
Lab Meeting June 29th, 2010 (Jeff): Fully Autonomous Trajectory Estimation with Long-Range Passive RFID
Authors: Philipp Vorst and Andreas Zell
Abstract:
We present a novel approach which enables a mobile robot to estimate its trajectory in an unknown environment with long-range passive radio-frequency identi cation
(RFID). The estimation is based only on odometry and RFID measurements. The technique requires no prior observation model and makes no assumptions on the RFID setup. In
particular, it is adaptive to the power level, the way the RFID antennas are mounted on the robot, and environmental characteristics, which have major impact on long-range RFID
measurements. Tag positions need not be known in advance, and only the arbitrary, given infrastructure of RFID tags in the environment is utilized. By a series of experiments with a
mobile robot, we show that trajectory estimation is achieved accurately and robustly.
Link:
IEEE International Conference on Robotics and Automation(ICRA), May 2010
http://www.ra.cs.uni-tuebingen.de/publikationen/2010/vorst2010icra.pdf