Sunday, December 26, 2010

Lab Meeting January 3rd, 2011(David) :Vision-Based Behavior Prediction in Urban Traffic Environments by Scene Categorization (BMVC 2010)

Title: Vision-Based Behavior Prediction in Urban Traffic Environments by Scene Categorization (BMVC 2010)

Authors: Martin Heracles, Fernando Martinelli and Jannik Fritsch

We propose a method for vision-based scene understanding in urban traffic environments that predicts the appropriate behavior of a human driver in a given visual scene. The method relies on a decomposition of the visual scene into its constituent objects by image segmentation and uses segmentation-based features that represent both their identity and spatial properties. We show how the behavior prediction can be naturally formulated as scene categorization problem and how ground truth behavior data for learning a classifier can be automatically generated from any monocular video sequence recorded from a moving vehicle, using structure from motion techniques. We evaluate our method both quantitatively and qualitatively on the recently proposed CamVid dataset, predicting the appropriate velocity and yaw rate of the car as well as their appropriate change for both day and dusk sequences. In particular, we investigate the impact of the underlying segmentation and the number of behavior classes on the quality of these predictions


Wednesday, December 22, 2010

Lab Meeting December 27, 2010(Chih Chung) : Lozano-Perez. Belief space planning assuming maximum likelihood observations.(RSS 2010)

Title:Belief space planning assuming maximum likelihood observations

Authors:Robert Platt Jr., Russ Tedrake, Leslie Kaelbling, Tomas Lozano-Perez

We cast the partially observable control problem as
a fully observable underactuated stochastic control problem in
belief space and apply standard planning and control techniques.
One of the difficulties of belief space planning is modeling the
stochastic dynamics resulting from unknown future observations.
The core of our proposal is to define deterministic beliefsystem
dynamics based on an assumption that the maximum
likelihood observation (calculated just prior to the observation)
is always obtained. The stochastic effects of future observations
are modelled as Gaussian noise. Given this model of the dynamics,
two planning and control methods are applied. In the first, linear
quadratic regulation (LQR) is applied to generate policies in the
belief space. This approach is shown to be optimal for linear-
Gaussian systems. In the second, a planner is used to find locally
optimal plans in the belief space. We propose a replanning
approach that is shown to converge to the belief space goal
in a finite number of replanning steps. These approaches are
characterized in the context of a simple nonlinear manipulation
problem where a planar robot simultaneously locates and grasps
an object.


Sunday, December 19, 2010

Lab Meeting December 20, 2010(Chung-Han): progress report

I will report my progress on ground-truth annotation.

Sunday, December 12, 2010

Lab Meeting December 13, 2010(ShaoChen): DDF-SAM: Fully Distributed SLAM using Constrained Factor Graphs(IROS2010)

Title: DDF-SAM: Fully Distributed SLAM using Constrained Factor Graphs

Authors: Alexander Cunningham, Manohar Paluri, and Frank Dellaert


We address the problem of multi-robot distributed SLAM with an extended Smoothing and Mapping (SAM) approach to implement Decentralized Data Fusion (DDF). We present DDF-SAM, a novel method for efficiently and robustly distributing map information across a team of robots, to achieve scalability in computational cost and in communication bandwidth and robustness to node failure and to changes in network topology. DDF-SAM consists of three modules: (1) a local optimization module to execute single-robot SAM and condense the local graph; (2) a communication module to collect and propagate condensed local graphs to other robots, and (3) a neighborhood graph optimizer module to combine local graphs into maps describing the neighborhood of a robot. We demonstrate scalability and robustness through a simulated example, in which inference is consistently faster than a comparable naive approach.


Monday, December 06, 2010

Lab Meeting December 6th, 2010(Nicole): Acoustic Source Localization and Tracking Using Track Before Detect

Title: Acoustic Source Localization and Tracking Using Track Before Detect

Authors: Maurice F. Fallon, Simon Godsill

Particle Filter-based Acoustic Source Localization algorithms attempt to track the position of a sound source—one or more people speaking in a room—based on the current data from a microphone array as well as all previous data up to that point. This paper first discusses some of the inherent behavioral traits of the steered beamformer localization function. Using conclusions drawn from that study, a multitarget methodology for acoustic source tracking based on the Track Before Detect (TBD) framework is introduced. The algorithm also implicitly evaluates source activity using a variable appended to the state vector. Using the TBD methodology avoids the need to identify a set of source measurements and also allows for a vast increase in the number of particles used for a comparitive computational load which results in increased tracking stability in challenging recording environments. An evaluation of tracking performance is given using a set of real speech recordings with two simultaneously active speech sources.


Lab Meeting December 6th, 2010(KuoHuei): progress report

I will present my progress on Neighboring Objects Interaction models.

Sunday, November 28, 2010

Lab Meeting November 29, 2010 (Wang Li): Adaptive Pose Priors for Pictorial Structures (CVPR 2010)

Adaptive Pose Priors for Pictorial Structures

Benjamin Sapp
Chris Jordan
Ben Taskar


The structure and parameterization of a pictorial structure model is often restricted by assuming tree dependency structure and unimodal, data-independent pairwise interactions, which fail to capture important patterns in the data. On the other hand, local methods such as kernel density estimation provide nonparametric flexibility but require large amounts of data to generalize well. We propose a simple semi-parametric approach that combines the tractability of pictorial structure inference with the flexibility of non-parametric methods by expressing a subset of model parameters as kernel regression estimates from a learned sparse set of exemplars. This yields query-specific, image-dependent pose priors. We develop an effective shape-based kernel for upper-body pose similarity and propose a leave-one-out loss function for learning a sparse subset of exemplars for kernel regression. We apply our techniques to two challenging datasets of human figure parsing and advance the state-of-the-art (from 80% to 86% on the Buffy dataset), while using only 15% of the training data as exemplars.

Paper Link

Saturday, November 27, 2010

Lab Meeting November 29th, 2010 (Jeff): Sub-Meter Indoor Localization in Unmodified Environments with Inexpensive Sensors

Title: Sub-Meter Indoor Localization in Unmodified Environments with Inexpensive Sensors

Authors: Morgan Quigley, David Stavens, Adam Coates, and Sebastian Thrun


The interpretation of uncertain sensor streams for localization is usually considered in the context of a robot. Increasingly, however, portable consumer electronic devices, such as smartphones, are equipped with sensors including WiFi radios, cameras, and inertial measurement units (IMUs). Many tasks typically associated with robots, such as localization, would be valuable to perform on such devices. In this paper, we present an approach for indoor localization exclusively using the low-cost sensors typically found on smartphones. Environment modification is not needed. We rigorously evaluate our method using ground truth acquired using a laser range scanner. Our evaluation includes overall accuracy and a comparison of the contribution of individual sensors. We find experimentally that fusion of multiple sensor modalities is necessary for optimal performance and demonstrate sub-meter localization accuracy.

IEEE/RSJ International Conference on Intelligent Robots and Systems(IROS), October 2010


Monday, November 22, 2010

Lab Meeting November 22, 2010 (Andi): Three-Dimensional Mapping with Time-of-Flight Cameras

Title: Three-Dimensional Mapping with Time-of-Flight Cameras

Authors: Stefan May, David Droeschel, Dirk Holz, Stefan Fuchs, Ezio Malis, Andreas Nuechter and Joachim Hertzberg

Journal of Field Robotics 2009

Abstract: This article investigates the use of time-of-flight (ToF) cameras in mapping tasks for autonomous mobile robots, in particular in simultaneous localization and mapping (SLAM) tasks. Although ToF cameras are in principle an attractive type of sensor for threedimensional (3D) mapping owing to their high rate of frames of 3D data, two features make them difficult as mapping sensors, namely, their restricted field of view and influences on the quality of range measurements by high dynamics in object reflectivity; in addition, currently available models suffer from poor data quality in a number of aspects. The paper first summarizes calibration and filtering approaches for improving the accuracy, precision, and robustness of ToF cameras independent of their intended usage. Then, several ego motion estimation approaches are applied or adapted, respectively, in order to provide a performance benchmark for registering ToF camera data. As a part of this, an extension to the iterative closest point algorithm has been developed that increases the robustness under restricted field of view and under larger displacements. Using an indoor environment, the paper provides results from SLAM experiments using these approaches in comparison. It turns out that the application of ToF cameras is feasible to SLAM tasks, although this type of sensor has a complex error characteristic.

Sunday, November 21, 2010

Lab Meeting November 22, 2010 (Alan): Temporary Maps for Robust Localization in Semi-static Environments (IROS 2010)

Title: Temporary Maps for Robust Localization in Semi-static Environments (IROS 2010)
Authors: Daniel Meyer-Delius, Jurgen Hess, Giorgio Grisetti, Wolfram Burgard

Abstract—Accurate and robust localization is essential for the successful navigation of autonomous mobile robots. The majority of existing localization approaches, however, is based on the assumption that the environment is static which does not hold for most practical application domains. In this paper, we present a localization framework that can robustly track a robot’s pose even in non-static environments. Our approach keeps track of the observations caused by unexpected objects in the environment using temporary local maps. It relies both on these temporary local maps and on a reference map of the environment for estimating the pose of the robot. Experimental results demonstrate that by exploiting the observations caused by unexpected objects our approach outperforms standard localization methods for static environments.

Link: pdf

Monday, November 15, 2010

Lab Meeting November 15( KuenHan ), 3D Reconstruction of a Moving Point from a Series of 2D Projections (ECCV 2010)

Title :3D Reconstruction of a Moving Point from a Series of 2D Projections
Author: Hyun Soo Park, Takaaki Shiratori, Iain Matthews, and Yaser Sheikh


This paper presents a linear solution for reconstructing the 3D trajectory of a moving point from its correspondence in a collection of 2D perspective images, given the 3D spatial pose and time of capture of the cameras that produced each image. Triangulation-based solutions do not apply, as multiple views of the point may not exist at each instant in time. A geometric analysis of the problem is presented and a criterion, called reconstructibility, is defined to precisely characterize the cases when reconstruction is possible, and how accurate it can be. We apply the linear reconstruction algorithm to reconstruct the time evolving 3D structure of several real-world scenes, given a collection of non-coincidental 2D images.


Sunday, November 14, 2010

Lab Meeting November 15, 2010 (fish60): Unfreezing the Robot: Navigation in Dense, Interacting Crowds

Title: Unfreezing the Robot: Navigation in Dense, Interacting Crowds(IROS 2010)
Author: Peter Trautman and Andreas Krause

Abstract—In this paper, we study the safe navigation of a mobile robot through crowds of dynamic agents with uncertain trajectories. Existing algorithms suffer from the “freezing robot” problem: once the environment surpasses a certain level of complexity, the planner decides that all forward paths are unsafe, and the robot freezes in place (or performs unnecessary aneuvers) to avoid collisions. ... In this work, we demonstrate that both the individual prediction and the predictive uncertainty have little to do with the frozen robot problem. Our key insight is that dynamic agents solve the frozen robot problem by engaging in “joint collision avoidance”: They cooperatively make room to create feasible trajectories. We develop IGP, a nonparametric statistical model based on Dependent Output Gaussian Processes that can estimate crowd interaction from data. Our model naturally captures the non-Markov nature of agent trajectories, as well as their goal-driven navigation. We then show how planning in this model can be efficiently implemented using particle based inference.


Monday, November 01, 2010

CMU PhD Thesis Defense: Geolocation with Range: Robustness, Efficiency and Scalability

CMU RI PhD Thesis Defense
Joseph A. Djugash
Geolocation with Range: Robustness, Efficiency and Scalability
November 05, 2010, 10:00 a.m., NSH 1507


This thesis explores the topic of geolocation with range. A robust method for localization and SLAM (Simultaneous Localization and Mapping) is proposed. This method uses a polar parameterization of the state to achieve accurate estimates of the nonlinear and multi-modal distributions in range-only systems. Several experimental evaluations on real robots reveal the reliability of this method.

Scaling such a system to large network of nodes, increases the computational load on the system due to the increased state vector. To alleviate this problem, we propose the use of a distributed estimation algorithm based on the belief propagation framework. This method distributes the estimation task, such that each node only estimates its local network, greatly reducing the computation performed by any individual node. However, the method does not provide any guarantees on the convergence of its solution in general graphs. Convergence is only guaranteed for non-cyclic graphs (ie. trees). Thus, an extension of this approach which reduces any arbitrary graph to a spanning tree is presented. This enables the proposed decentralized localization method to provide guarantees on its convergence.

Scaling in the traditional sense involves extensions to deal with growth in the size of the operating environment. In large, feature-less environments, maintaining a globally consistent estimate of a group of mobile agents is difficult. In this thesis, a novel multi-robot coordination strategy is proposed. Based on the observability analysis of the system, the propose controller achieves the tight coordination necessary to obtain an accurate global estimate. The proposed approach is demonstrated using both simulation and experimental testing with real robots.


Thesis Committee
Sanjiv Singh, Chair
George Kantor
Howie Choset
Wolfram Burgard, University of Freiburg

Sunday, October 31, 2010

Lab Meeting November 1, 2010 (Will): Visual Event Recognition in Videos by Learning from Web Data (CVPR 2010)

Title: Visual Event Recognition in Videos by Learning from Web Data (CVPR 2010)
Author: Lixin Duan, Dong Xu, Ivor W. Tsang, Jiebo Luo

We propose a visual event recognition framework for consumer domain videos by leveraging a large amount of loosely labeled web videos (e.g., from YouTube). First, we propose a new aligned space-time pyramid matching method to measure the distances between two video clips, where each video clip is divided into space-time volumes over multiple levels. We calculate the pairwise distances between any two volumes and further integrate the information from different volumes with Integer-flow Earth Mover’s Distance (EMD) to explicitly align the volumes. Second, we propose a new cross-domain learning method in order to 1) fuse the information from multiple pyramid levels and features (i.e., space-time feature and static SIFT feature) and 2) cope with the considerable variation in feature dis- tributions between videos from two domains (i.e., web do- main and consumer domain). For each pyramid level and each type of local features, we train a set of SVM classifiers based on the combined training set from two domains using multiple base kernels of different kernel types and parameters, which are fused with equal weights to obtain an average classifier. Finally, we propose a cross-domain learning method, referred to as Adaptive Multiple Kernel Learning (A-MKL), to learn an adapted classifier based on multiple base kernels and the prelearned average classifiers by minimizing both the structural risk functional and the mismatch between data distributions from two domains. Extensive experiments demonstrate the effectiveness of our proposed framework that requires only a small number of labeled consumer videos by leveraging web data.

Friday, October 29, 2010

Lab meeting Nov. 01 2010, (Chih-Chung) POMDPs for robotic tasks with mixed observability (RSS 2009)

Title:POMDPs for robotic tasks with mixed observability
Author:Sylvie C.W.Ong, Shao Wei Png, David Hsu and Wee Sun Lee.

Partially observable Markov decision processes
(POMDPs) provide a principled mathematical framework for
motion planning of autonomous robots in uncertain and dynamic
environments. They have been successfully applied to
various robotic tasks, but a major challenge is to scale up
POMDP algorithms for more complex robotic systems. Robotic
systems often have mixed observability: even when a robot’s
state is not fully observable, some components of the state
may still be fully observable. Exploiting this, we use a factored
model to represent separately the fully and partially observable
components of a robot’s state and derive a compact lowerdimensional
representation of its belief space. We then use this
factored representation in conjunction with a point-based algorithm
to compute approximate POMDP solutions. Separating
fully and partially observable state components using a factored
model opens up several opportunities to improve the efficiency
of point-based POMDP algorithms. Experiments show that on
standard test problems, our new algorithm is many times faster
than a leading point-based POMDP algorithm.

Thursday, October 28, 2010

News: University of Chicago, Cornell Researchers Develop Universal Robotic Gripper

Robotic hands are usually just that -- hands -- but some researchers from the University of Chicago and Cornell University (with a little help from iRobot) have taken a decidedly different approach for their so-called universal robotic gripper. As you can see above, the gripper is actually a balloon that can conform to and grip just about any small object, and hang onto it firmly enough to pick it up. What's the secret? After much testing, the researchers found that ground coffee was the best substance to fill the balloon with -- to grab an object, the gripper simply creates a vacuum in the balloon (much like a vacuum-sealed bag of coffee), and it's then able to let go of the object just by releasing the vacuum. Simple, but it works. Head on past the break to check it out in action. [via engadget]

Monday, October 25, 2010

Lab meeting Oct. 25 2010, (David) Threat-aware Path Planning in Uncertain Urban Environments (IROS 2010)

Title: Threat-aware Path Planning in Uncertain Urban Environments

Authors: Georges S. Aoude, Brandon D. Luders, Daniel S. Levine, and Jonathan P. How

This paper considers the path planning problem
for an autonomous vehicle in an urban environment populated
with static obstacles and moving vehicles with uncertain intents.
We propose a novel threat assessment module, consisting of
an intention predictor and a threat assessor, which augments
the host vehicle’s path planner with a real-time threat value
representing the risks posed by the estimated intentions of
other vehicles. This new threat-aware planning approach is
applied to the CL-RRT path planning framework, used by the
MIT team in the 2007 DARPA Grand Challenge. The strengths
of this approach are demonstrated through simulation and
experiments performed in the RAVEN testbed facilities

[local copy]

[link ]

[local video]


Monday, October 11, 2010

Lab meeting Oct. 11 2010, (Shao-Chen) Consistent data association in multi-robot systems with limited communications(RSS 2010)

Title: Consistent data association in multi-robot systems with limited communications

Authors: Rosario Aragues,Eduardo Montijano, and Carlos Sagues


In this paper we address the data association
problem of features observed by a robot team with limited communications.
At every time instant, each robot can only exchange
data with a subset of the robots, its neighbors. Initially, each
robot solves a local data association with each of its neighbors.
After that, the robots execute the proposed algorithm to agree
on a data association between all their local observations which
is globally consistent. One inconsistency appears when chains of
local associations give rise to two features from one robot being
associated among them. The contribution of this work is the
decentralized detection and resolution of these inconsistencies.
We provide a fully decentralized solution to the problem. This
solution does not rely on any particular communication topology.
Every robot plays the same role, making the system robust to
individual failures. Information is exchanged exclusively between
neighbors. In a finite number of iterations, the algorithm finishes
with a data association which is free of inconsistent associations.
In the experiments, we show the performance of the algorithm
under two scenarios. In the first one, we apply the resolution
and detection algorithm for a set of stochastic visual maps. In
the second, we solve the feature matching between a set of images
taken by a robotic team.


Lab meeting Oct. 11th 2010, (Nicole) Improvement in Listening Capability for Humanoid Robot HRP-2(ICRA 2010)

Title: Improvement in Listening Capability for Humanoid Robot HRP-2 (ICRA2010)

Authors: Toru Takahashi, Kazuhiro Nakadai, Kazunori Komatani, Tetsuya Ogata and Hiroshi G. Okuno.


This paper describes improvement of sound source separation for a simultaneous automatic speech recognition (ASR) system of a humanoid robot. A recognition error in the system is caused by a separation error and interferences of other sources. In separability, an original geometric source separation (GSS) is improved. Our GSS uses a measured robot’s head related transfer function (HRTF) to estimate a separation matrix. As an original GSS uses a simulated HRTF calculated based on a distance between microphone and sound source, there is a large mismatch between the simulated and the measured transfer functions. The mismatch causes a severe degradation of recognition performance.

Faster convergence speed of separation matrix reduces separation error. Our approach gives a nearer initial separation matrix based on a measured transfer function from an optimal separation matrix than a simulated one. As a result, we expect that our GSS improves the convergence speed. Our GSS is also able to handle an adaptive step-size parameter.

These new features are added into open source robot audition software (OSS) called”HARK” which is newly updated as version 1.0.0. The HARK has been installed on a HRP-2 humanoid with an 8-element microphone array. The listening capability of HRP-2 is evaluated by recognizing a target speech signal which is separated from a simultaneous speech signal by three talkers. The word correct rate (WCR) of ASR improves by 5 points under normal acoustic environments and by 10 points under noisy environments. Experimental results show that HARK 1.0.0 improves the robustness against noises.


Lab meeting Oct. 11th 2010, (Andi) Dynamic 3D Scene Analysis for Acquiring Articulated Scene Models

Dynamic 3D Scene Analysis for Acquiring Articulated Scene Models

ICRA 2010

Agnes Swadzba, Niklas Beuter, Sven Wachsmuth, and Franz Kummert

In this paper we present a new system for a mobile robot to generate an articulated scene model by analyzing complex dynamic 3D scenes. The system extracts essential knowledge about the foreground, like moving persons, and the background, which consists of all visible static scene parts. In contrast to other 3D reconstruction approaches, we suggest to additionally distinguish between static parts, like walls, and movable objects like chairs or doors. The discrimination supports the reconstruction process and additionally, delivers important information about interaction objects. Here, the movable object detection is realized object independent by analyzing changes in the scenery. Furthermore, in the proposed system the background scene is feedbacked to the tracking part yielding a much better tracking and detection result which improves again the 3D reconstruction. We show in our experiments that we are able to provide a sound background model and to extract simultaneously persons and object regions representing chairs, doors, and even smaller movable objects.

Sunday, October 10, 2010

News: Google's Self-Driving Cars

Google: We've Been Secretly Building And Testing Robot Cars That Drive Themselves
By Sebastian Thrun, Google
Read the full article.

Google Cars Drive Themselves, in Traffic
By John Markoff, The New York Times 
Read the full article.

Friday, October 08, 2010

News: MIT Media Lab Medical Mirror

MIT Medical Media Lab Mirror tells your pulse with a webcam

Mirror mirror on the wall, who has the highest arterial palpation of them all? If you went to MIT you might be able to answer that question thanks to the work of grad student Ming-Zher Poh, who has found a way to tell your pulse with just a simple webcam and some software. By looking at minute changes in the brightness of the face, the system can find the beating of your heart even at a low resolution, comparable to the results of a traditional FDA-approved pulse monitor. Right now the mirror above is just a proof of concept, but the idea is that the hospital beds or surgery rooms of tomorrow might be able to monitor a patient's pulse without requiring any wires or physical contact, encouraging news for anyone who has ever tried to sleep whilst wearing a heart monitor. [via Engadget]

Sunday, October 03, 2010

Lab Meeting October 4th, 2010 (Jeff): Progress Report

I will represent the progress in RFID SLAM with some indexing value to show the performance.

Lab Meeting October 4th, 2010(KuoHuei): progress report

I will present the Neighboring Object Interacting Tracking, including modeling, learning, inference, and some results.

Monday, September 27, 2010

Lab Meeting September 27, 2010 (Wang Li): Monocular 3D Pose Estimation and Tracking by Detection (CVPR 2010)

Monocular 3D Pose Estimation and Tracking by Detection

Mykhaylo Andriluka
Stefan Roth
Bernt Schiele

Automatic recovery of 3D human pose from monocular
image sequences is a challenging and important research
topic with numerous applications. Although current methods
are able to recover 3D pose for a single person in controlled
environments, they are severely challenged by realworld
scenarios, such as crowded street scenes. To address
this problem, we propose a three-stage process building on
a number of recent advances. The first stage obtains an initial
estimate of the 2D articulation and viewpoint of the person
from single frames. The second stage allows early data
association across frames based on tracking-by-detection. The third and
final stage uses those tracklet-based estimates as robust image
observations to reliably recover 3D pose. We demonstrate
state-of-the-art performance on the HumanEva II
benchmark, and also show the applicability of our approach
to articulated 3D tracking in realistic street conditions.

Paper Link

Sunday, September 19, 2010

Lab Meeting September 20, 2010 (Kuen-Han): Scale Drift-Aware Large Scale Monocular SLAM (RSS 2010)

Title: Scale Drift-Aware Large Scale Monocular SLAM

Author: Hauke Strasdat, J.M.M. Montiel, Andrew J. Davison

Abstract—State of the art visual SLAM systems have recently
been presented which are capable of accurate, large-scale and
real-time performance, but most of these require stereo vision.
Important application areas in robotics and beyond open up
if similar performance can be demonstrated using monocular
vision, since a single camera will always be cheaper, more
compact and easier to calibrate than a multi-camera rig.
With high quality estimation, a single camera moving through
a static scene of course effectively provides its own stereo
geometry via frames distributed over time. However, a classic
issue with monocular visual SLAM is that due to the purely
projective nature of a single camera, motion estimates and map
structure can only be recovered up to scale. Without the known
inter-camera distance of a stereo rig to serve as an anchor, the
scale of locally constructed map portions and the corresponding
motion estimates is therefore liable to drift over time.
In this paper we describe a new near real-time visual SLAM
system which adopts the continuous keyframe optimisation approach
of the best current stereo systems, but accounts for
the additional challenges presented by monocular input. In
particular, we present a new pose-graph optimisation technique
which allows for the efficient correction of rotation, translation
and scale drift at loop closures. Especially, we describe the
Lie group of similarity transformations and its relation to the
corresponding Lie algebra. We also present in detail the system’s
new image processing front-end which is able accurately to track
hundreds of features per frame, and a filter-based approach
for feature initialisation within keyframe-based SLAM. Our
approach is proven via large-scale simulation and real-world
experiments where a camera completes large looped trajectories.


Lab Meeting September 20, 2010 (Alan): Probabilistic Surveillance with Multiple Active Cameras (ICRA 2010)

Title: Probabilistic Surveillance with Multiple Active Cameras (ICRA 2010)
Authors: Eric Sommerlade and Ian Reid

In this work we present a consistent probabilistic approach to control multiple, but diverse pan-tilt-zoom cameras concertedly observing a scene. There are disparate goals to this control: the cameras are not only to react to objects moving about, arbitrating conflicting interests of target resolution and trajectory accuracy, they are also to anticipate the appearance of new targets.
We base our control function on maximisation of expected mutual information gain, which to our knowledge is novel to the field of computer vision in the context of multiple pan-tilt-zoom camera control. This information theoretic measure yields a utility for each goal and parameter setting, making the use of physical or computational resources comparable. Weighting this utility allows to prioritise certain objectives or targets in the control.
The resulting behaviours in typical situations for multicamera systems, such as camera hand-off, acquisition of closeups and scene exploration, are emergent but intuitive. We quantitatively show that without the need for hand crafted rules they address the given objectives.

Monday, September 13, 2010

Lab Meeting September 13th, 2010(fish60): progress report

I will briefly show what I have done these days with the review of LEARCH algorithm.

Saturday, September 11, 2010

Lab Meeting September 13th, 2010(Gary): AAM based Face Tracking with Temporal Matching and Face Segmentation(CVPR 2010)

AAM based Face Tracking with Temporal Matching and Face Segmentation

Mingcai Zhou, Lin Liang, Jian Sun, Yangsheng Wang


Active Appearance Model (AAM) based face tracking has
advantages of accurate alignment, high efficiency, and
effectiveness for handling face deformation. However, AAM
suffers from the generalization problem and has difficulties
in images with cluttered backgrounds. In this paper, we in-
troduce two novel constraints into AAM fitting to address
the above problems. We first introduce a temporal matching
constraint in AAM fitting. In the proposed fitting scheme,
the temporal matching enforces an inter-frame local ap-
pearance constraint between frames. The resulting model
takes advantage of temporal matching's good generalizabil-
ity, but does not suffer from the mismatched points. To make
AAM more stable for cluttered backgrounds, we introduce a
color-based face segmentation as a soft constraint. Both
constraints effectively improve the AAM tracker's perfor-
mance, as demonstrated with experiments on various chal-
lenging real-world videos.


Wednesday, September 08, 2010

PhD Thesis Defense: David Silver [Learning Preference Models for Autonomous Mobile Robots in Complex Domains]

PhD Thesis Defense: David Silver
Learning Preference Models for Autonomous Mobile Robots in Complex Domains
Carnegie Mellon University
September 13, 2010, 12:30 p.m., NSH 1507

Achieving robust and reliable autonomous operation even in complex unstructured environments is a central goal of field robotics. ...
This thesis presents the development and application of machine learning techniques that automate the construction and tuning of preference models within complex mobile robotic systems. Utilizing the framework of inverse optimal control, expert examples of robot behavior can be used to construct models that generalize demonstrated preferences and reproduce similar behavior. Novel learning from demonstration approaches are developed that offer the possibility of significantly reducing the amount of human interaction necessary to tune a system, while also improving its final performance. Techniques to account for the inevitability of noisy and imperfect demonstration are presented, along with additional methods for improving the efficiency of expert demonstration and feedback.

The effectiveness of these approaches is confirmed through application to several real world domains, such as the interpretation of static and dynamic perceptual data in unstructured environments and the learning of human driving styles and maneuver preferences. ... These experiments validate the potential applicability of the developed algorithms to a large variety of future mobile robotic systems.


Monday, September 06, 2010

Lab Meeting September 7th, 2010 (Jimmy): Learning to Recognize Objects from Unseen Modalities

Title: Learning to Recognize Objects from Unseen Modalities
In ECCV2010

Authors: C. Mario Christoudias, Raquel Urtasun, Mathieu Salzmann and Trevor Darrell

In this paper we investigate the problem of exploiting multiple sources of information for object recognition tasks when additional modalities that are not present in the labeled training set are available for inference. This scenario is common to many robotics sensing applications and is in contrast with the assumption made by existing approaches that require at least some labeled examples for each modality. To leverage the previously unseen features, we make use of the unlabeled data to learn a mapping from the existing modalities to the new ones. This allows us to predict the missing data for the labeled examples and exploit all modalities using multiple kernel learning. We demonstrate the e ectiveness of our approach on several multi-modal tasks including object recognition from multi-resolution imagery, grayscale and color images, as well as images and text. Our approach outperforms multiple kernel learning on the original modalities, as well as nearest-neighbor and bootstrapping schemes.


Sunday, September 05, 2010

Lab Meeting September 7th, 2010 (Will(柏崴)): Efficient Computation of Robust Low-Rank Matrix Approximations in the Presence of Missing Data using the L1 Norm (CVPR2010)

Title: Efficient Computation of Robust Low-Rank Matrix Approximations in the Presence of Missing Data using the L1 Norm

Authors: Anders Eriksson and Anton van den Hengel

The calculation of a low-rank approximation of a matrix is a fundamental operation in many computer vision applications. The workhorse of this class of problems has long been the Singular Value Decomposition. However, in the presence of missing data and outliers this method is not applicable, and unfortunately, this is often the case in practice.
In this paper we present a method for calculating the low-rank factorization of a matrix which minimizes the L1 norm in the presence of missing data. Our approach represents a generalization the Wiberg algorithm, one of the more convincing methods for factorization under the L2 norm. By utilizing the differentiability of linear programs, we can extend the underlying ideas behind this approach to include this class of L1 problems as well. We show that the proposed algorithm can be efficiently implemented using existing optimization software. We also provide preliminary experiments on synthetic as well as real world data with very convincing results.

Saturday, August 28, 2010

Lab Meeting August 31st, 2010 (zhi-zhong(執中)): Efficient Planning under Uncertainty for a Target-Tracking Micro-Aerial Vehicle (ICRA'10)

Title: Efficient Planning under Uncertainty for a Target-Tracking Micro-Aerial Vehicle

Authors: Ruijie He, Abraham Bachrach and Nicholas Roy

A helicopter agent has to plan trajectories to track multiple ground targets from the air. The agent has partial information of each target’s pose, and must reason about its uncertainty of the targets’ poses when planning subsequent actions.
We present an online, forward-search algorithm for planning under uncertainty by representing the agent’s belief of each target’s pose as a multi-modal Gaussian belief. We exploit this parametric belief representation to directly compute the distribution of posterior beliefs after actions are taken. This analytic computation not only enables us to plan in problems with continuous observation spaces, but also allows the agent to search deeper by considering policies composed of multistep action sequences; deeper searches better enable the agent to keep the targets well-localized. We present experimental results in simulation, as well as demonstrate the algorithm on an actual quadrotor helicopter tracking multiple vehicles on a road network constructed indoors.

local copy : [link]

Lab Meeting August 31st, 2010 (David): Scene Understanding in a Large Dynamic Environment through a Laser-based Sensing (ICRA'10)

Scene Understanding in a Large Dynamic Environment through a Laser-based Sensing

Huijing Zhao, Yiming Liu, Xiaolong Zhu, Yipu Zhao, Hongbin Zha

It became a well known technology that a map of complex environment containing low-level geometric primitives (such as laser points) can be generated using a robot with laser scanners. This research is motivated by the need of obtaining semantic knowledge of a large urban outdoor environment after the robot explores and generates a low-level sensing data set. An algorithm is developed with the data represented in a range image, while each pixel can be converted into a 3D coordinate. Using an existing segmentation method that models only geometric homogeneities, the data of a single object of complex geometry, such as people, cars, trees etc., is partitioned into different segments. Such a segmentation result will greatly restrict the capability of object recognition. This research proposes a framework of simultaneous segmentation and classification of range image, where the classification of each segment is conducted based on its geometric properties, and homogeneity of each segment is evaluated conditioned on each object class. Experiments are presented using the data of a large dynamic urban outdoor environment, and performance of the algorithm is evaluated.

local copy : [link]

Monday, August 23, 2010

Lab Meeting August 23rd, 2010 (Nicole): Evaluating Real-time Audio Localization Algorithms for Artificial Audition in Robotics (IROS'09)

Title: Evaluating Real-time Audio Localization Algorithms for Artificial Audition in Robotics

Authors: Anthony Badali,Jean-Marc Valin,Francois Michaud,and Parham Aarabi

Although research on localization of sound sources using microphone arrays has been carried out for years, providing such capabilities on robots is rather new. Artificial audition systems on robots currently exist, but no evaluation of the methods used to localize sound sources has yet been conducted. This paper presents an evaluation of various real-time audio localization algorithms using a medium-sized micro-phone array which is suitable for applications in robotics. Thetechniques studied here are implementations and enhancements of steered response power - phase transform beamformers, which represent the most popular methods for time difference of arrival audio localization. In addition, two different grid topologies for implementing source direction search are also compared. Results show that a direction refinement procedure can be used to improve localization accuracy and that more efficient and accurate direction searches can be performed using a uniform triangular element grid rather than the typical rectangular element grid.

local copy : [link]

Lab Meeting August 23rd, 2010 (ShaoChen): Distributed Nonlinear Estimation for Robot Localization using Weighted Consensus (ICRA'10)

Title: Distributed Nonlinear Estimation for Robot Localization using Weighted Consensus

Authors: Andrea Simonetto, Tam´as Keviczky and Robert Babuˇska


 Distributed linear estimation theory has received increased
attention  in  recent  years  due  to  several  promising
industrial applications. Distributed nonlinear estimation, however
is  still  a  relatively  unexplored  field  despite  the  need  in
numerous practical situations for techniques that can handle
nonlinearities. This paper presents a unified way of describing
distributed implementations of three commonly used nonlinear
estimators: the Extended Kalman Filter, the Unscented Kalman
Filter  and  the  Particle  Filter.  Leveraging  on  the  presented
framework,  we  propose  new  distributed  versions  of  these
methods, in which the nonlinearities are locally managed by
the various sensors whereas the different estimates are merged
based on a weighted average consensus process. The proposed
versions are shown to outperform the few published ones in
two robot localization test cases.


Tuesday, August 10, 2010

Lab Meeting August 10th, 2010 (KuoHuel): An Online Approach: Learning-Semantic-Scene-by-Tracking and Tracking-by-Learning-Semantic-Scene (CVPR'10)

Title: An Online Approach: Learning-Semantic-Scene-by-Tracking and

Authors: Xuan Song, Xiaowei Shao, Huijing Zhao, Jinshi Cui, Ryosuke Shibasaki and Hongbin Zha

Learning the knowledge of scene structure and tracking
a large number of targets are both active topics of computer
vision in recent years, which plays a crucial role in surveil-
lance, activity analysis, object classification and etc. In
this paper, we propose a novel system which simultaneously
performs the Learning-Semantic-Scene and Tracking, and
makes them supplement each other in one framework. The
trajectories obtained by the tracking are utilized to continu-
ally learn and update the scene knowledge via an online un-
supervised learning. On the other hand, the learned knowl-
edge of scene in turn is utilized to supervise and improve
the tracking results. Therefore, this “adaptive learning-
tracking loop” can not only perform the robust tracking in
high density crowd scene, dynamically update the knowl-
edge of scene structure and output semantic words, but also
ensures that the entire process is completely automatic and
online. We successfully applied the proposed system into the
JR subway station of Tokyo, which can dynamically obtain
the semantic scene structure and robustly track more than
150 targets at the same time.


Monday, August 09, 2010

Lab Meeting August 10th, 2010 (Jeff): FAB-MAP + RatSLAM: Appearance-based SLAM for Multiple Times of Day

Title: FAB-MAP + RatSLAM: Appearance-based SLAM for Multiple Times of Day

Authors: Arren J. Glover, William P. Maddern, Michael J. Milford, and Gordon F. Wyeth


Appearance-based mapping and localisation is especially challenging when separate processes of mapping and localisation occur at different times of day. The problem is exacerbated in the outdoors where continuous change in sun angle can drastically affect the appearance of a scene. We confront this challenge by fusing the probabilistic local feature based data association method of FAB-MAP with the pose cell filtering and experience mapping of RatSLAM. We evaluate the effectiveness of our amalgamation of methods using five datasets captured throughout the day from a single camera driven through a network of suburban streets. We show further results when the streets are re-visited three weeks later, and draw conclusions on the value of the system for lifelong mapping.

IEEE International Conference on Robotics and Automation(ICRA), May 2010

Wednesday, August 04, 2010

CVPR 2010 Awards

This post is to provide links to the best paper awards in CVPR 2010.

Best Student Paper

Best Paper Honorable Mention

Best Paper

Longuet-Higgins Prize

  • Efficient Matching of Pictorial Structures: Pedro F. Felzenszwalb and Daniel P. Huttenlocher
  • Real-Time Tracking of Non-Rigid Objects Using Mean Shift: Dorin Comaniciu, Visvanathan Ramesh, and Peter Meer

Monday, August 02, 2010

Lab Meeting August 3rd, 2010 (Wang Li): Modeling Mutual Context of Object and Human Pose in Human-Object Interaction Activities (CVPR 2010)

Modeling Mutual Context of Object and Human Pose in Human-Object Interaction Activities

Bangpeng Yao
Li Fei-Fei

Detecting objects in cluttered scenes and estimating articulated human body parts are two challenging problems in computer vision. We observe, however, that objects and human poses can serve as mutual context to each other – recognizing one facilitates the recognition of the other.
In this paper, we propose a new random field model to encode the mutual context of objects and human poses in human-object interaction activities. We then cast the model learning task as a structure learning problem, of which the structural connectivity between the object, the overall human pose and different body parts are estimated through a structure search approach, and the parameters of the model are estimated by a new max-margin algorithm.
On a sports data set of six classes of human-object interactions, we show that our mutual context model significantly outperforms state-of-the-art in detecting very difficult objects and human poses.

Paper Link

Thursday, July 29, 2010

Lab Meeting 8 / 3, 2010 (Alan) - Mapping Indoor Environments Based on Human Activity (ICRA 2010)

Title: Mapping Indoor Environments Based on Human Activity (ICRA 2010)
Authors: Slawomir Grzonka, Frederic Dijoux, Andreas Karwath, Wolfram Burgard

We present a novel approach to build approximate maps of structured environments utilizing human motion and activity. Our approach uses data recorded with a data suit which is equipped with several IMUs to detect movements of a person and door opening and closing events. In our approach we interpret the movements as motion constraints and door handling events as landmark detections in a graph-based SLAM framework. As we cannot distinguish between individual doors, we employ a multi-hypothesis approach on top of the SLAM system to deal with the high data-association uncertainty. As a result, our approach is able to accurately and robustly recover the trajectory of the person. We additionally take advantage of the fact that people traverse free space and that doors separate rooms to recover the geometric structure of the environment after the graph optimization. We evaluate our approach in several experiments carried out with different users and in environments of different types.

Link: pdf

Monday, July 26, 2010

Lab Meeting 07/27, 2010(Kuen-Han) Non-Rigid Structure from Locally-Rigid Motion (CVPR,2010)

Title: Non-Rigid Structure from Locally-Rigid Motion
Authors: Jonathan Taylor Allan D. Jepson Kiriakos N. Kutulakos

We introduce locally-rigid motion, a general framework for
solving the M-point, N-view structure-from-motion problem
for unknown bodies deforming under orthography. The
key idea is to first solve many local 3-point, N-view rigid
problems independently, providing a “soup” of specific,
plausibly rigid, 3D triangles. The main advantage here is
that the extraction of 3D triangles requires only very weak
assumptions: (1) deformations can be locally approximated
by near-rigid motion of three points (i.e., stretching not
dominant) and (2) local motions involve some generic rotation
in depth. Triangles from this soup are then grouped
into bodies, and their depth flips and instantaneous relative
depths are determined. Results on several sequences,
both our own and from related work, suggest these conditions
apply in diverse settings—including very challenging
ones (e.g., multiple deforming bodies). Our starting point
is a novel linear solution to 3-point structure from motion,
a problem for which no general algorithms currently exist.


Saturday, July 24, 2010

Lab Meeting July 20, 2010 (fish60): What if the Irresponsible Teachers Are Dominating? A Method of Training on Samples and Clustering on Teachers

Sorry for the previous blank post.
Here's the content:

What if the Irresponsible Teachers Are Dominating? A Method of Training on Samples and Clustering on Teachers

Shuo Chen, Jianwen Zhang, Guangyun Chen, Changshui Zhang
State Key Laboratory on Intelligent Technology and Systems
Tsinghua National Laboratory for Information Science and Technology (TNList)
Department of Automation, Tsinghua University, Beijing 100084, China

Learning from multiple teachers or sources
has received more attention of the researchers in the machine
learning area. In this setting, the learning system is dealing
with samples and labels provided by multiple teachers, who
in common cases, are non-expert. Their labeling styles and
behaviors are usually diverse, some of which are even detrimental
to the learning system. Thus, simply putting them
together and utilizing the algorithms designed for singleteacher
scenario would be not only improper, but also damaging.
Our work focuses on a case where the teachers are composed of good
ones and irresponsible ones. By irresponsible, we mean the
teacher who takes the labeling task not seriously and label
the sample at random without inspecting the sample itself.
If we do not take out their effects, our learning system would be ruined with no
doubt. In this paper, we propose a method for picking out the
good teachers with promising experimental results. It works
even when the irresponsible teachers are dominating in numbers.


Wednesday, July 21, 2010

江山代有才人出 攻讀博士—不輕言放棄

Author: 王榮騰 臺大客座教授


A生曾榮獲美國極頂尖大學某一指導教授〈Advisor〉給予的全額研究助理獎學金〈RA,Research Assistantship〉,一年半後,A生放棄學業,正在覓職中!
B生曾榮獲美國另一極頂尖大學給予的一年期全額研究生獎學金〈Graduate Fellowship〉,一年後及時拿到RA,卻一直認為研究與現實脫節,擔心未來就業機會而深感困擾!




指導教授常會同時進行數個研究項目,可當面請教並說明原因,是否能更換原指定之研究題目。若非不合理,教授多半都會接受。須知,博士論文〈PhD Dissertation〉大多由幾個研究專題組合而成。因此,最好是在文章被期刊或會議接受後提出;一來,可對目前該專題有所交代〈不至於浪費教授研究經費〉,二來也有助於自己博士論文的進展。再者,亦可利用這段時間對新研究項目有所了解。




換言之,不管個人興趣是否與指導教授研究領域相近,繼續跟定指導教授,不輕言放棄;且莫在博士資格考試〈PhD Qualifying Examination〉未通過前提出,以免造成輟學的嚴重後果。

經過長期溝通,最後A生接受指導教授建議,先休學、工作一段時間,再考慮是否繼續完成博士學位。B生則同時加入另一教授之研究團隊,不排除於畢業後往學術界發展;其後續已不再為研究課題而煩惱,並已在新覓研究領域之尖端會議中發表論文。由於處理得宜,目前這兩位高材生仍與原來指導教授保持良好關係。畢竟,恩師難覓,必須知福惜福;師生情難建,值得一生珍惜! 〈王榮騰 臺大電機系與電子工程研究所客座教授;2010年6月6日〉

Sunday, July 18, 2010

Lab Meeting July 20, 2010 (Gary): Robust Unified Stereo-Based 3D Head Tracking and Its Application to Face Recognition (ICRA2010)

Robust Unified Stereo-Based 3D Head Tracking and Its Application
to Face Recognition

Authors: Kwang Ho An and Myung Jin Chung

This paper investigates the estimation of 3D head poses and its identity authentication with a partial ellipsoid model. To cope with large out-of-plane rotations and translation in-depth, we extend conventional head tracking with a single camera to a stereo-based framework. To achieve more robust motion estimation even under time-varying lighting conditions, we incorporate illumination correction into the aforementioned framework. We approximate the face image variations due to illumination changes as a linear combination of illumination bases. Also,��by computing the illumination bases online from the registered face images, after estimating the 3D head poses, user-specific illumination bases can be obtained, and therefore illumination-robust tracking without a prior learning process can be possible. Furthermore, our unified stereo-based tracking is approximated as a linear least-squares problem; a closed-form solution is then provided. After recovering the full-motions of the head, we can register face images with pose variations into stabilized-view images, which are suitable for pose-robust face recognition. To verify the feasibility and applicability of our approach, we performed extensive experiments with three sets of challenging image sequences.


Thursday, July 15, 2010

Lab Meeting July 20, 2010 (Jimmy): Group-Sensitive Multiple Kernel Learning for Object Categorization

Title: Group-Sensitive Multiple Kernel Learning for Object Categorization
Authors: Jingjing Yang, Yuanning Li, Yonghong Tian, Lingyu Duan, Wen Gao
In: ICCV 2009

In this paper, we propose a group-sensitive multiple kernel learning (GS-MKL) method to accommodate the intra-class diversity and the inter-class correlation for object categorization. By introducing an intermediate representation “group” between images and object categories, GS-MKL attempts to find appropriate kernel combination for each group to get a finer depiction of object categories. For each category, images within a group share a set of kernel weights while images from different groups may employ distinct sets of kernel weights. In GS-MKL, such group-sensitive kernel combinations together with the multi-kernels based classifier are optimized in a joint manner to seek a trade-off between capturing the diversity and keeping the invariance for each category. Extensive experiments show that our proposed GS-MKL method has achieved encouraging performance over three challenging datasets.


Monday, July 12, 2010

Lab Meeting July 13, 2010(ShaoChen):Rao-Blackwellized Particle Filters Multi Robot SLAM with Unknown Initial Correspondences and Limited Communication(ICRA 2010)

Title: Rao-Blackwellized Particle Filters Multi Robot SLAM with Unknown Initial Correspondences and Limited Communication

Authors: Luca Carlone, Miguel Kaouk Ng, Jingjing Du, Basilio Bona, and Marina Indri


Multi robot systems are envisioned to play an important role in many robotic applications. A main prerequisite for a team deployed in a wide unknown area is the capability of autonomously navigate, exploiting the information acquired through the on-line estimation of both robot poses
and surrounding environment model, according to Simultaneous Localization And Mapping (SLAM) framework. As team coordination is improved, distributed techniques for filtering
are required in order to enhance autonomous exploration and large scale SLAM increasing both efficiency and robustness of operation. Although Rao-Blackwellized Particle Filters (RBPF) have been demonstrated to be an effective solution to the problem of single robot SLAM, few extensions to teams of robots exist, and these approaches are characterized by strict assumptions on both communication bandwidth and prior knowledge on relative poses of the teammates. In the present paper we address the problem of multi robot SLAM in the case of limited communication and unknown relative initial poses. Starting from the well established single robot RBPFSLAM, we propose a simple technique which jointly estimates SLAM posterior of the robots by fusing the prioceptive and the eteroceptive information acquired by each teammate. The approach intrinsically reduces the amount of data to be exchanged among the robots, while taking into account the uncertainty in relative pose measurements. Moreover it can be naturally extended to different communication technologies (bluetooth, RFId, wifi, etc.) regardless their sensing range. The proposed approach is validated through experimental test.


Lab Meeting July 13,2010(Nicole):Mutual Localization in a Team of Autonomous Robots using Acoustic Robot Detection

Title: Mutual Localization in a Team of Autonomous Robots using Acoustic Robot Detection

Authors: David Becker and Max Risler

In RoboCup 2008: Robot Soccer World Cup XII ,Volume 5399/2009

In order to improve self-localization accuracy we are exploring ways of mutual localization in a team of autonomous robots. Detecting team mates visually usually leads to inaccurate bearings and only rough distance estimates. Also, visually identifying teammates is not possible. Therefore we are investigating methods of gaining relative position information acoustically in a team of robots.
The technique introduced in this paper is a variant of code-multiplexed communication (CDMA, code division multiple access). In a CDMA system, several receivers and senders can communicate at the same time, using the same carrier frequency. Well-known examples of CDMA systems include wireless computer networks and the Global Positioning System, GPS. While these systems use electro-magnetic waves, we will try to adopt the CDMA principle towards using acoustic pattern recognition, enabling robots to calculate distances and bearings to each other.
First, we explain the general idea of cross-correlation functions and appropriate signal pattern generation. We will further explain the importance of synchronized clocks and discuss the problems arising from clock drifts.
Finally, we describe an implementation using the Aibo ERS-7 as platform and briefly state basic results, including measurement accuracy and a runtime estimate. We will briefly discuss acoustic localization in the specific scenario of a RoboCup soccer game.


Tuesday, July 06, 2010

Lab Meeting July 6th (Casey): Live Dense Reconstruction with a Single Moving Camera (CVPR 2010)

Authors: Richard A. Newcombe and Andrew J. Davison


We present a method which enables rapid and dense reconstruction of scenes browsed by a single live camera. We take point-based real-time structure from motion (SFM) as our starting point, generating accurate 3D camera pose estimates and a sparse point cloud. Our main novel contribution is to use an approximate but smooth base mesh generated from the SFM to predict the view at a bundle of poses around automatically selected reference frames spanning the scene, and then warp the base mesh into highly accurate depth maps based on view-predictive optical flow and a constrained scene flow update. The quality of the resulting depth maps means that a convincing global scene model can be obtained simply by placing them side by side and removing overlapping regions. We show that a cluttered indoor environment can be reconstructed from a live hand-held camera in a few seconds, with all processing performed by current desktop hardware. Real-time monocular dense reconstruction opens up many application areas, and we demonstrate both real-time novel view synthesis and advanced augmented reality where augmentations interact physically with the 3D scene and are correctly clipped by occlusions.

Monday, July 05, 2010

Lab Meeting July 6th 2010 (Andi): Upsampling Range Data in Dynamic Environments (CVPR 2010 )


Jennifer Dolson, Jongmin Baek, Christian Plagemann and Sebastian Thrun (Stanford University)


We present a flexible method for fusing information from optical and range sensors based on an accelerated high-dimensional filtering approach. Our system takes as input a sequence of monocular camera images as well as a stream of sparse range measurements as obtained from a laser or other sensor system. In contrast with existing approaches, we do not assume that the depth and color data streams have the same data rates or that the observed scene is fully static. Our method produces a dense, high-resolution depth map of the scene, automatically generating confidence values for every interpolated depth point. We describe how to integrate priors on object shape, motion and appearance and how to achieve an efficient implementation using parallel processing hardware such as GPUs.


Monday, June 28, 2010

Lab Meeting June 29th, 2010 (KuoHuel): People Tracking with Human Motion Predictions from Social Forces (ICRA'10)

Title: People Tracking with Human Motion Predictions from Social Forces

Authors: Matthias Luber, Johannes A. Stork, Gian Diego Tipaldi, and Kai O. Arras

For many tasks in populated environments, robots need to keep track of present and future motion states of people. Most approaches to people tracking make weak assumptions on human motion such as constant velocity and direction. But even over a short period, human motion behavior is more complex and influenced by factors such as an intended goal, other people, objects in the environment, or social rules. Therefore, more sophisticated motion models are highly desirable especially since people frequently undergo lengthy occlusion events.
For the study of crowd behavior or evacuation dynamics, computational models that describe individual and collective pedestrian dynamics have been developed in e.g. the social psychology community. In this paper, we make use of such a model for the purpose of people tracking. Concretely, we integrate a pedestrian dynamics model based on social forces into a multi-hypothesis target tracker. We show how the re ned motion predictions translate into more informed probability distributions over hypotheses and nally into a more robust tracking behavior and better occlusion handling. In experiments in indoor and outdoor environments with data from a laser range nder, the social force model leads to more accurate tracking with up to two times fewer data association errors.

Lab Meeting June 29th, 2010 (Jeff): Fully Autonomous Trajectory Estimation with Long-Range Passive RFID

Title: Fully Autonomous Trajectory Estimation with Long-Range Passive RFID

Authors: Philipp Vorst and Andreas Zell


We present a novel approach which enables a mobile robot to estimate its trajectory in an unknown environment with long-range passive radio-frequency identi cation
(RFID). The estimation is based only on odometry and RFID measurements. The technique requires no prior observation model and makes no assumptions on the RFID setup. In
particular, it is adaptive to the power level, the way the RFID antennas are mounted on the robot, and environmental characteristics, which have major impact on long-range RFID
measurements. Tag positions need not be known in advance, and only the arbitrary, given infrastructure of RFID tags in the environment is utilized. By a series of experiments with a
mobile robot, we show that trajectory estimation is achieved accurately and robustly.

IEEE International Conference on Robotics and Automation(ICRA), May 2010