Title:
Deformable Graph Matching
Author:
Feng Zhou Fernando and De la Torre
Robotics Institute, Carnegie Mellon University, Pittsburgh, PA 15213
Abstract:
Graph matching (GM) is a fundamental problem in computer science, and it has been successfully applied to many problems in computer vision. Although widely used, existing GM algorithms cannot incorporate global consistence among nodes, which is a natural constraint in computer vision problems. This paper proposes deformable graph matching (DGM), an extension of GM for matching graphs subject to global rigid and non-rigid geometric constraints. The key idea of this work is a new factorization of the pair-wise affinity matrix. This factorization decouples the affinity matrix into the local structure of each graph and the pair-wise affinity edges. Besides the ability to incorporate global geometric transformations, this factorization offers three more benefits. First, there is no need to compute the costly (in space and time) pair-wise affinity matrix. Second, it provides a unified view of many GM methods and extends the standard iterative closest point algorithm. Third, it allows to use the path-following optimization algorithm that leads to improved optimization strategies and matching performance. Experimental results on synthetic and real databases illustrate how DGM outperforms state-of-the-art algorithms for GM. The code is available at http://humansensing.cs.cmu.edu/fgm.
From:
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2013
Link
This Blog is maintained by the Robot Perception and Learning lab at CSIE, NTU, Taiwan. Our scientific interests are driven by the desire to build intelligent robots and computers, which are capable of servicing people more efficiently than equivalent manned systems in a wide variety of dynamic and unstructured environments.
Sunday, December 29, 2013
Thursday, December 26, 2013
Lab Meeting, December 26, 2013 (Tom Hsu): An Efficient Motion Segmentation Algorithm for Multibody RGB-D SLAM, (Proceedings of Australasian Conference on Robotics and Automation, 2-4 Dec 2013)
Title:
An Efficient Motion Segmentation Algorithm for Multibody RGB-D SLAM
Author:
Youbing Wang, Shoudong Huang
Faculty of Engineering and IT, University of Technology, Sydney, Australia
Abstract:
A simple motion segmentation algorithm using only two frames of RGB-D data is proposed, and both simulational and experimental segmentation results show its efficiency and reliability. To further verify its usability in multi-body SLAM scenarios, we firstly apply it to a simulated typical multi-body SLAM problem
with only a RGB-D camera, and then utilize it to segment a real RGB-D dataset collected by ourselves. Based on the good results of our motion segmentation algorithm, we can get satisfactory SLAM results for the simulated problem and the segmentation results using real data also enable us to get visual odometry for each
motion group thus facilitate the following steps to solve the practical multi-body RGB-D SLAM problems.
From:
Proceedings of Australasian Conference on Robotics and Automation, 2-4 Dec 2013, University of New South Wales, Sydney Australia
Link:
paper
An Efficient Motion Segmentation Algorithm for Multibody RGB-D SLAM
Author:
Youbing Wang, Shoudong Huang
Faculty of Engineering and IT, University of Technology, Sydney, Australia
Abstract:
A simple motion segmentation algorithm using only two frames of RGB-D data is proposed, and both simulational and experimental segmentation results show its efficiency and reliability. To further verify its usability in multi-body SLAM scenarios, we firstly apply it to a simulated typical multi-body SLAM problem
with only a RGB-D camera, and then utilize it to segment a real RGB-D dataset collected by ourselves. Based on the good results of our motion segmentation algorithm, we can get satisfactory SLAM results for the simulated problem and the segmentation results using real data also enable us to get visual odometry for each
motion group thus facilitate the following steps to solve the practical multi-body RGB-D SLAM problems.
From:
Proceedings of Australasian Conference on Robotics and Automation, 2-4 Dec 2013, University of New South Wales, Sydney Australia
Link:
paper
Monday, December 16, 2013
Lab Meeting, December 19, 2013 (Yen-Ting): Deformable Spatial Pyramid Matching for Fast Dense Correspondences
Title: Deformable Spatial Pyramid Matching for Fast Dense Correspondences
Authors: Jaechul Kim, Ce Liu, Fei Sha and Kristen Grauman
Abstract: We introduce a fast deformable spatial pyramid (DSP) matching algorithm for computing dense pixel correspondences. Dense matching methods typically enforce both appearance agreement between matched pixels as well as geometric smoothness between neighboring pixels. Whereas the prevailing approaches operate at the pixel level, we propose a pyramid graph model that simultaneously regularizes match consistency at multiple spatial extents—ranging from an entire image, to coarse grid cells, to every single pixel. This novel regularization substantially improves pixel-level matching in the face of challenging image variations, while the “deformable” aspect of our model overcomes the strict rigidity of traditional spatial pyramids. Results on LabelMe and Caltech show our approach outperforms state-of-the-art methods (SIFT Flow [15] and PatchMatch [2]), both in terms of accuracy and run time.
P.S.
[2] C. Barnes, E. Shechtman, D. Goldman, and A. Finkelstein. The Generalized PatchMatch Correspondence Algorithm. In ECCV, 2010.
[15] C. Liu, J. Yuen, and A. Torralba. SIFT Flow: Dense Correspondence across Different Scenes and Its Applications. PAMI, 33(5), 2011.
From: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2013
Link: Click here
Authors: Jaechul Kim, Ce Liu, Fei Sha and Kristen Grauman
Abstract: We introduce a fast deformable spatial pyramid (DSP) matching algorithm for computing dense pixel correspondences. Dense matching methods typically enforce both appearance agreement between matched pixels as well as geometric smoothness between neighboring pixels. Whereas the prevailing approaches operate at the pixel level, we propose a pyramid graph model that simultaneously regularizes match consistency at multiple spatial extents—ranging from an entire image, to coarse grid cells, to every single pixel. This novel regularization substantially improves pixel-level matching in the face of challenging image variations, while the “deformable” aspect of our model overcomes the strict rigidity of traditional spatial pyramids. Results on LabelMe and Caltech show our approach outperforms state-of-the-art methods (SIFT Flow [15] and PatchMatch [2]), both in terms of accuracy and run time.
P.S.
[2] C. Barnes, E. Shechtman, D. Goldman, and A. Finkelstein. The Generalized PatchMatch Correspondence Algorithm. In ECCV, 2010.
[15] C. Liu, J. Yuen, and A. Torralba. SIFT Flow: Dense Correspondence across Different Scenes and Its Applications. PAMI, 33(5), 2011.
From: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2013
Link: Click here
Wednesday, November 27, 2013
Lab Meeting Nov. 28, (Yi) Coherent Motion Segmentation in Moving Camera Videos using Optical Flow Orientations
Authors: Manjunath Narayana, Allen Hanson, Erik Learned-Miller
Abstract
In moving camera videos, motion segmentation is com-monly performed using the image plane motion of pixels, or optical flow. However, objects that are at different depths from the camera can exhibit different optical flows even if they share the same real-world motion. This can cause a depth-dependent segmentation of the scene. Our goal is to develop a segmentation algorithm that clusters pixels that have similar real-world motion irrespective of their depth
in the scene. Our solution uses optical flow orientations in-stead of the complete vectors and exploits the well-known property that under camera translation, optical flow ori-entations are independent of object depth. We introduce a probabilistic model that automatically estimates the number of observed independent motions and results in a labeling that is consistent with real-world motion in the scene. The result of our system is that static objects are correctly identified as one segment, even if they are at different depths. Color features and information from previous frames in the video sequence are used to correct occasional errors due
to the orientation-based segmentation. We present results on more than thirty videos from different benchmarks. The system is particularly robust on complex background scenes containing objects at significantly different depths.
link
Abstract
In moving camera videos, motion segmentation is com-monly performed using the image plane motion of pixels, or optical flow. However, objects that are at different depths from the camera can exhibit different optical flows even if they share the same real-world motion. This can cause a depth-dependent segmentation of the scene. Our goal is to develop a segmentation algorithm that clusters pixels that have similar real-world motion irrespective of their depth
in the scene. Our solution uses optical flow orientations in-stead of the complete vectors and exploits the well-known property that under camera translation, optical flow ori-entations are independent of object depth. We introduce a probabilistic model that automatically estimates the number of observed independent motions and results in a labeling that is consistent with real-world motion in the scene. The result of our system is that static objects are correctly identified as one segment, even if they are at different depths. Color features and information from previous frames in the video sequence are used to correct occasional errors due
to the orientation-based segmentation. We present results on more than thirty videos from different benchmarks. The system is particularly robust on complex background scenes containing objects at significantly different depths.
link
Wednesday, November 20, 2013
Lab Meeting Nov. 21, (Channing) Where Do I Look Now? Gaze Allocation During Visually Guided Manipulation (ICRA 2012)
Title: Where Do I Look Now? Gaze Allocation During Visually Guided Manipulation (ICRA 2012)
Authors: Jose Nunez-Varela, B. Ravindran, Jeremy L.Wyatt
ABSTRACT - In this work we present principled methods for the coordination of a robot's oculomotor system with the rest of its body motor systems. The problem is to decide which physical actions to perform next and where the robot's gaze should be directed in order to gain information that is relevant to the success of its physical actions. Previous work on this problem has shown that a reward-based coordination mechanism provides an efficient solution. However, that approach does not allow the robot to move its gaze to different parts of the scene, it considers the robot to have only one motor system, and assumes that the actions have the same duration. The main contributions of our work are to extend that previous reward-based approach by making decisions about where to fixate the robot's gaze, handling multiple motor systems, and handling actions of variable duration. We compare our approach against two common baselines: random and round robin gaze allocation. We show how our method provides a more effective strategy to allocate gaze where it is needed the most.
Link
Authors: Jose Nunez-Varela, B. Ravindran, Jeremy L.Wyatt
ABSTRACT - In this work we present principled methods for the coordination of a robot's oculomotor system with the rest of its body motor systems. The problem is to decide which physical actions to perform next and where the robot's gaze should be directed in order to gain information that is relevant to the success of its physical actions. Previous work on this problem has shown that a reward-based coordination mechanism provides an efficient solution. However, that approach does not allow the robot to move its gaze to different parts of the scene, it considers the robot to have only one motor system, and assumes that the actions have the same duration. The main contributions of our work are to extend that previous reward-based approach by making decisions about where to fixate the robot's gaze, handling multiple motor systems, and handling actions of variable duration. We compare our approach against two common baselines: random and round robin gaze allocation. We show how our method provides a more effective strategy to allocate gaze where it is needed the most.
The Extension of the above work:
Title: Gaze Allocation Analysis for a Visually Guided Manipulation Task (SAB 2012)
Authors: Jose Nunez-Varela, B. Ravindran, Jeremy L.Wyatt
ABSTRACT - Findings from eye movement research in humans have demonstrated that the task determines where to look. One hypothesis is that the purpose of looking is to reduce uncertainty about properties relevant to the task. Following this hypothesis, we de ne a model that poses the problem of where to look as one of maximising task performance by reducing task relevant uncertainty. We implement and test our model on a simulated humanoid robot which has to move objects from a table into containers. Our model outperforms and is more robust than two other baseline schemes in terms of task performance whilst varying three environmental conditions, reach/grasp sensitivity, observation noise and the camera's field of view.
Thursday, November 14, 2013
Lab meeting Nov.14,(Benny) Detection- and Trajectory-Level Exclusion in Multiple Object Tracking (CVPR2013)
Authors: Anton Milan, Konrad Schindler, Stefan Roth
Abstract
When tracking multiple targets in crowded scenarios, modeling mutual exclusion between distinct targets becomes important at two levels: (1) in data association, each target observation should support at most one trajectory and each trajectory should be assigned at most one observation per frame; (2) in trajectory estimation, two trajectories should remain spatially separated at all times to avoid collisions. Yet, existing trackers often sidestep these important constraints. We address this using a mixed discrete-continuous conditional random field (CRF) that explicitly models both types of constraints: Exclusion between conflicting observations with supermodular pairwise terms, and exclusion between trajectories by generalizing global label costs to suppress the co-occurrence of incompatible labels (trajectories). We develop an expansion move-based MAP estimation scheme that handles both non-submodular constraints and pairwise global label costs. Furthermore, we perform a statistical analysis of ground-truth trajectories to derive appropriate CRF potentials for modeling data fidelity, target dynamics, and inter-target occlusion.
Abstract
When tracking multiple targets in crowded scenarios, modeling mutual exclusion between distinct targets becomes important at two levels: (1) in data association, each target observation should support at most one trajectory and each trajectory should be assigned at most one observation per frame; (2) in trajectory estimation, two trajectories should remain spatially separated at all times to avoid collisions. Yet, existing trackers often sidestep these important constraints. We address this using a mixed discrete-continuous conditional random field (CRF) that explicitly models both types of constraints: Exclusion between conflicting observations with supermodular pairwise terms, and exclusion between trajectories by generalizing global label costs to suppress the co-occurrence of incompatible labels (trajectories). We develop an expansion move-based MAP estimation scheme that handles both non-submodular constraints and pairwise global label costs. Furthermore, we perform a statistical analysis of ground-truth trajectories to derive appropriate CRF potentials for modeling data fidelity, target dynamics, and inter-target occlusion.
Wednesday, November 06, 2013
Lab meeting Nov.7, (ChihChung) MAV Urban Localization from Google Street View Data (IROS2013)
Authors: Andr´as L. Majdik, Yves Albers-Schoenberg, Davide Scaramuzza
Abstract—We tackle the problem of globally localizing a
camera-equipped micro aerial vehicle flying within urban environments
for which a Google Street View image database
exists. To avoid the caveats of current image-search algorithms
in case of severe viewpoint changes between the query and the
database images, we propose to generate virtual views of the
scene, which exploit the air-ground geometry of the system.
To limit the computational complexity of the algorithm, we
rely on a histogram-voting scheme to select the best putative
image correspondences. The proposed approach is tested on a
2km image dataset captured with a small quadroctopter flying
in the streets of Zurich. The success of our approach shows
that our new air-ground matching algorithm can robustly handle
extreme changes in viewpoint, illumination, perceptual aliasing,
and over-season variations, thus, outperforming conventional
visual place-recognition approaches.
[link]
Abstract—We tackle the problem of globally localizing a
camera-equipped micro aerial vehicle flying within urban environments
for which a Google Street View image database
exists. To avoid the caveats of current image-search algorithms
in case of severe viewpoint changes between the query and the
database images, we propose to generate virtual views of the
scene, which exploit the air-ground geometry of the system.
To limit the computational complexity of the algorithm, we
rely on a histogram-voting scheme to select the best putative
image correspondences. The proposed approach is tested on a
2km image dataset captured with a small quadroctopter flying
in the streets of Zurich. The success of our approach shows
that our new air-ground matching algorithm can robustly handle
extreme changes in viewpoint, illumination, perceptual aliasing,
and over-season variations, thus, outperforming conventional
visual place-recognition approaches.
[link]
Wednesday, October 30, 2013
Lab meeting Oct.31, (Andi) Non-rigid metric reconstruction from perspective cameras (IVCJ 2010)
Title: Non-rigid metric reconstruction from perspective cameras
Authors: Xavier Lladó, Alessio Del Bue, Lourdes Agapito
Abstract: The metric reconstruction of a non-rigid object viewed by a generic camera poses new challenges since current approaches for Structure from Motion assume the rigidity constraint of a shape as an essential condition. In this work, we focus on the estimation of the 3-D Euclidean shape and motion of a non-rigid shape observed by a perspective camera. In such case deformation and perspective effects are difficult to decouple – the parametrization of the 3-D non-rigid body may mistakenly account for the perspective distortion. Our method relies on the fact that it is often a reasonable assumption that some of the points on the object’s surface are deforming throughout the sequence while others remain rigid. Thus, relying on the rigidity constraints of a subset of rigid points, we estimate the perspective to metric upgrade trans- formation. First, we use an automatic segmentation algorithm to identify the set of rigid points. These are then used to estimate the internal camera calibration parameters and the overall rigid motion. Finally, we formulate the problem of non-rigid shape and motion estimation as a non-linear optimization where the objective function to be minimized is the image reprojection error. The prior information that some of the points in the object are rigid can also be added as a constraint to the non-linear minimization scheme in order to avoid ambiguous configurations. We perform experiments on different synthetic and real data sets which show that even when using a minimal set of rigid points and when varying the intrinsic cam- era parameters it is possible to obtain reliable metric information.
Link
Wednesday, October 23, 2013
Lab Meeting Oct. 24, 2013 (Alan): Optimal Metric Projections for Deformable and Articulated Structure-from-Motion (IJCV 2012)
Title: Optimal Metric Projections for Deformable and Articulated Structure-from-Motion (IJCV 2012)
Authors: Marco Paladini, Alessio Del Bue, João Xavier, Lourdes Agapito, Marko Stoši´c, Marija Dodig
Abstract:
This paper describes novel algorithms for recovering the 3D shape and motion of deformable and articulated objects purely from uncalibrated 2D image measurements using a factorisation approach. Most approaches to deformable and articulated structure from motion require to upgrade an initial affine solution to Euclidean space by imposing metric constraints on the motion matrix. While in the case of rigid structure the metric upgrade step is simple since the constraints can be formulated as linear, deformability in the shape introduces non-linearities. In this paper we propose an alternating bilinear approach to solve for non-rigid 3D shape and motion, associated with a globally optimal projection step of the motion matrices onto the manifold of metric constraints. Our novel optimal projection step combines into a single optimisation the computation of the orthographic projection matrix and the configuration weights that give the closest motion matrix that satisfies the correct block structure with the additional constraint that the projection matrix is guaranteed to have orthonormal rows (i.e. its transpose lies on the Stiefel manifold). This constraint turns out to be non-convex. The key contribution of this work is to introduce an efficient convex relaxation for the non-convex projection step. Efficient in the sense that, for both the cases of deformable and articulated motion, the proposed relaxations turned out to be exact (i.e. tight) in all our numerical experiments. The convex relaxations are semi-definite (SDP) or second-order cone (SOCP) programs which can be readily tackled by popular solvers. An important advantage of these new algorithms is their ability to handle missing data which becomes crucial when dealing with real video sequences with self-occlusions. We show successful results of our algorithms on synthetic and real sequences of both deformable and articulated data. We also show comparative results with state of the art algorithms which reveal that our new methods outperform existing ones.
Link
Authors: Marco Paladini, Alessio Del Bue, João Xavier, Lourdes Agapito, Marko Stoši´c, Marija Dodig
Abstract:
This paper describes novel algorithms for recovering the 3D shape and motion of deformable and articulated objects purely from uncalibrated 2D image measurements using a factorisation approach. Most approaches to deformable and articulated structure from motion require to upgrade an initial affine solution to Euclidean space by imposing metric constraints on the motion matrix. While in the case of rigid structure the metric upgrade step is simple since the constraints can be formulated as linear, deformability in the shape introduces non-linearities. In this paper we propose an alternating bilinear approach to solve for non-rigid 3D shape and motion, associated with a globally optimal projection step of the motion matrices onto the manifold of metric constraints. Our novel optimal projection step combines into a single optimisation the computation of the orthographic projection matrix and the configuration weights that give the closest motion matrix that satisfies the correct block structure with the additional constraint that the projection matrix is guaranteed to have orthonormal rows (i.e. its transpose lies on the Stiefel manifold). This constraint turns out to be non-convex. The key contribution of this work is to introduce an efficient convex relaxation for the non-convex projection step. Efficient in the sense that, for both the cases of deformable and articulated motion, the proposed relaxations turned out to be exact (i.e. tight) in all our numerical experiments. The convex relaxations are semi-definite (SDP) or second-order cone (SOCP) programs which can be readily tackled by popular solvers. An important advantage of these new algorithms is their ability to handle missing data which becomes crucial when dealing with real video sequences with self-occlusions. We show successful results of our algorithms on synthetic and real sequences of both deformable and articulated data. We also show comparative results with state of the art algorithms which reveal that our new methods outperform existing ones.
Link
Wednesday, October 16, 2013
Lab Meeting October 17th, 2013 (Jeff): Temporally Scalable Visual SLAM using a Reduced Pose Graph
Title: Temporally Scalable Visual SLAM using a Reduced Pose Graph
Authors: Hordur Johannsson, Michael Kaess, Maurice Fallon, and John J. Leonard
Abstract:
In this paper, we demonstrate a system for temporally scalable visual SLAM using a reduced pose graph representation. Unlike previous visual SLAM approaches that maintain static keyframes, our approach uses new measurements to continually improve the map, yet achieves efficiency by avoiding adding redundant frames and not using marginalization to reduce the graph. To evaluate our approach, we present results using an online binocular visual SLAM system that uses place recognition for both robustness and multi-session operation. Additionally, to enable large-scale indoor mapping, our system automatically detects elevator rides based on accelerometer data. We demonstrate long-term mapping in a large multi-floor building, using approximately nine hours of data collected over the course of six months. Our results illustrate the capability of our visual SLAM system to map a large are over extended period of time.
IEEE International Conference on Robotics and Automation (ICRA), 2013
Link:
LocalLink
http://people.csail.mit.edu/kaess/pub/Johannsson13icra.pdf
Reference Link:
Another paper with the same title:
In RSS Workshop on Long-term Operation of Autonomous Robotic Systems in Changing Environments, 2012.
http://people.csail.mit.edu/kaess/pub/Johannsson12rssw.pdf
Authors: Hordur Johannsson, Michael Kaess, Maurice Fallon, and John J. Leonard
Abstract:
In this paper, we demonstrate a system for temporally scalable visual SLAM using a reduced pose graph representation. Unlike previous visual SLAM approaches that maintain static keyframes, our approach uses new measurements to continually improve the map, yet achieves efficiency by avoiding adding redundant frames and not using marginalization to reduce the graph. To evaluate our approach, we present results using an online binocular visual SLAM system that uses place recognition for both robustness and multi-session operation. Additionally, to enable large-scale indoor mapping, our system automatically detects elevator rides based on accelerometer data. We demonstrate long-term mapping in a large multi-floor building, using approximately nine hours of data collected over the course of six months. Our results illustrate the capability of our visual SLAM system to map a large are over extended period of time.
IEEE International Conference on Robotics and Automation (ICRA), 2013
Link:
LocalLink
http://people.csail.mit.edu/kaess/pub/Johannsson13icra.pdf
Reference Link:
Another paper with the same title:
In RSS Workshop on Long-term Operation of Autonomous Robotic Systems in Changing Environments, 2012.
http://people.csail.mit.edu/kaess/pub/Johannsson12rssw.pdf
Monday, September 30, 2013
Lab Meeting Oct. 3rd (Jim): Robot Navigation in Dense Human Crowds: the Case for Cooperation
Title: Robot Navigation in Dense Human Crowds: the Case for Cooperation
Authors: Pete Trautman, Jeremy Ma, Richard M. Murray and Andreas Krause
in Proceedings of the IEEE International Conference on Robotics and Automation (ICRA 2013)
Abstract:
... we explore two questions. Can we design a navigation algorithm that encourages humans to cooperate with a robot? Would such cooperation improve navigation performance? We address the first question by developing a probabilistic predictive model of cooperative collision avoidance and goal-oriented behavior. ... We answer the second question by empirically validating our model in a natural environment (a university cafeteria), and in the process, carry out the first extensive quantitative study of robot navigation in dense human crowds (completing 488 runs). The “multiple goal” interacting Gaussian processes algorithm performs comparably with human teleoperators in crowd densities near 1 person/m2, while a state of the art noncooperative planner exhibits unsafe behavior more than 3 times as often as our planner. ... We conclude that a cooperation model is critical for safe and efficient robot navigation in dense human crowds.
Link
Authors: Pete Trautman, Jeremy Ma, Richard M. Murray and Andreas Krause
in Proceedings of the IEEE International Conference on Robotics and Automation (ICRA 2013)
Abstract:
... we explore two questions. Can we design a navigation algorithm that encourages humans to cooperate with a robot? Would such cooperation improve navigation performance? We address the first question by developing a probabilistic predictive model of cooperative collision avoidance and goal-oriented behavior. ... We answer the second question by empirically validating our model in a natural environment (a university cafeteria), and in the process, carry out the first extensive quantitative study of robot navigation in dense human crowds (completing 488 runs). The “multiple goal” interacting Gaussian processes algorithm performs comparably with human teleoperators in crowd densities near 1 person/m2, while a state of the art noncooperative planner exhibits unsafe behavior more than 3 times as often as our planner. ... We conclude that a cooperation model is critical for safe and efficient robot navigation in dense human crowds.
Link
Tuesday, September 24, 2013
Lab Meeting September 26, 2013 (Gene): Track-to-Track Fusion With Asynchronous Sensors Using Information Matrix Fusion for Surround Environment Perception
Title: Track-to-Track Fusion With Asynchronous Sensors Using Information Matrix Fusion for Surround Environment Perception
Authors: Michael Aeberhard, Stefan Schlichthärle, Nico Kaempchen, and Torsten Bertram
In: IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2012
Abstract—Driver-assistance systems and automated driving applications in the future will require reliable and flexible surround environment perception. Sensor data fusion is typically used to increase reliability and the observable field of view. In this paper, a novel approach to track-to-track fusion in a high-level sensor
data fusion architecture for automotive surround environment perception using information matrix fusion (IMF) is presented. It is shown that IMF produces the same good accuracy in state estimation as a low-level centralized Kalman filter, which is widely known to be the most accurate method of fusion. Additionally, as
opposed to state-of-the-art track-to-track fusion algorithms, the presented approach guarantees a globally maintained track over time as an object passes in and out of the field of view of several sensors, as required in surround environment perception. As opposed to the often-used cascaded Kalman filter for track-to-track
fusion, it is shown that the IMF algorithm has a smaller error and maintains consistency in the state estimation. The proposed approach using IMF is compared with other track-to-track fusion algorithms in simulation and is shown to perform well using real sensor data in a prototype vehicle with a 12-sensor configuration for surround environment perception in highly automated driving applications.
link
Wednesday, September 11, 2013
Lab Meeting September 12, 2013 (Jimmy): Indoor Tracking and Navigation Using Received Signal Strength and Compressive Sensing on a Mobile Device
Title: Indoor Tracking and Navigation Using Received Signal Strength and Compressive Sensing on a Mobile Device
Authors: Anthea Wain Sy Au, Chen Feng, Shahrokh Valaee, Sophia Reyes, Sameh Sorour, Samuel N. Markowitz, Deborah Gold, Keith Gordon, and Moshe Eizenman
In: IEEE Transactions on Mobile Computing 2013
Abstract
An indoor tracking and navigation system based on measurements of received signal strength (RSS) in wireless local area network (WLAN) is proposed. In the system, the location determination problem is solved by first applying a proximity constraint to limit the distance between a coarse estimate of the current position and a previous estimate. Then, a Compressive Sensing-based (CS-based) positioning scheme, proposed in our previous work [1], [2], is applied to obtain a refined position estimate. The refined estimate is used with a map-adaptive Kalman filter, which assumes a linear motion between intersections on a map that describes the user’s path, to obtain a more robust position estimate. Experimental results with the system that is implemented on a PDA with limited resources (HP iPAQ hx2750 PDA) show that the proposed tracking system outperforms the widely used traditional positioning and tracking systems. Meanwhile, the tracking system leads to 12.6 percent reduction in the mean position error compared to the CS-based stationary positioning system when three APs are used. A navigation module that is integrated with the tracking system provides users with instructions to guide them to predefined destinations. Thirty visually impaired subjects from the Canadian National Institute for the Blind (CNIB) were invited to further evaluate the performance of the navigation system. Testing results suggest that the proposed system can be used to guide visually impaired subjects to their desired destinations.
[Link]
Authors: Anthea Wain Sy Au, Chen Feng, Shahrokh Valaee, Sophia Reyes, Sameh Sorour, Samuel N. Markowitz, Deborah Gold, Keith Gordon, and Moshe Eizenman
In: IEEE Transactions on Mobile Computing 2013
Abstract
An indoor tracking and navigation system based on measurements of received signal strength (RSS) in wireless local area network (WLAN) is proposed. In the system, the location determination problem is solved by first applying a proximity constraint to limit the distance between a coarse estimate of the current position and a previous estimate. Then, a Compressive Sensing-based (CS-based) positioning scheme, proposed in our previous work [1], [2], is applied to obtain a refined position estimate. The refined estimate is used with a map-adaptive Kalman filter, which assumes a linear motion between intersections on a map that describes the user’s path, to obtain a more robust position estimate. Experimental results with the system that is implemented on a PDA with limited resources (HP iPAQ hx2750 PDA) show that the proposed tracking system outperforms the widely used traditional positioning and tracking systems. Meanwhile, the tracking system leads to 12.6 percent reduction in the mean position error compared to the CS-based stationary positioning system when three APs are used. A navigation module that is integrated with the tracking system provides users with instructions to guide them to predefined destinations. Thirty visually impaired subjects from the Canadian National Institute for the Blind (CNIB) were invited to further evaluate the performance of the navigation system. Testing results suggest that the proposed system can be used to guide visually impaired subjects to their desired destinations.
[Link]
Tuesday, September 03, 2013
Lab Meeting Sep 5th 2013 (Tom Hsu): Efficient Dense 3D Rigid-Body Motion Segmentation in RGB-D Video
Title: Efficient Dense 3D Rigid-Body Motion Segmentation in RGB-D Video
Authors: Jörg Stückler, Sven Behnke
From: British Machine Vision Conference (BMVC), Bristol, UK, 2013
Authors: Jörg Stückler, Sven Behnke
From: British Machine Vision Conference (BMVC), Bristol, UK, 2013
Abstract:
Motion is a fundamental segmentation cue in video. Many current approaches segment 3D motion in monocular or stereo image sequences, mostly relying on sparse interest points or being dense but computationally demanding. We propose an efficient expectation-maximization (EM) framework for dense 3D segmentation of moving rigid parts in RGB-D video. Our approach segments two images into pixel regions that undergo coherent 3D rigid-body motion. Our formulation treats background and foreground objects equally and poses no further assumptions on the motion of the camera or the objects than rigidness. While our EM-formulation is not restricted to a specific image representation, we supplement it with efficient image representation and registration for rapid segmentation of RGB-D video. In experiments we demonstrate that our approach recovers segmentation and 3D motion at good precision.
Tuesday, August 27, 2013
Lab Meeting, August 29, 2013 (Chiang Yi): Pose Estimation using Local Structure-Specific Shape and Appearance Context
Title: Pose Estimation using Local Structure-Specific Shape and Appearance Context
Authors: Anders Glent Buch, Dirk Kraft, Joni-Kristian Kamarainen, Henrik Gordon Petersen and Norbert Kr ̈uger
Abstract: We address the problem of estimating the alignment pose between two models using structure-specific local descriptors. Our descriptors are generated using a combination of 2D image data and 3D contextual shape data, resulting in a set of semi-local descriptors containing rich appearance and shape information for both edge and texture structures. This is achieved by defining feature space relations which describe the neighborhood of a descriptor. By quantitative evaluations, we show that our descriptors provide high discriminative power compared to state of the art approaches. In addition, we show how to utilize this for the estimation of the alignment pose between two point sets. We present experiments both in
controlled and real-life scenarios to validate our approach.
From: ICRA 2013
Link: http://covil.sdu.dk/publications/paper1099.pdf
Authors: Anders Glent Buch, Dirk Kraft, Joni-Kristian Kamarainen, Henrik Gordon Petersen and Norbert Kr ̈uger
Abstract: We address the problem of estimating the alignment pose between two models using structure-specific local descriptors. Our descriptors are generated using a combination of 2D image data and 3D contextual shape data, resulting in a set of semi-local descriptors containing rich appearance and shape information for both edge and texture structures. This is achieved by defining feature space relations which describe the neighborhood of a descriptor. By quantitative evaluations, we show that our descriptors provide high discriminative power compared to state of the art approaches. In addition, we show how to utilize this for the estimation of the alignment pose between two point sets. We present experiments both in
controlled and real-life scenarios to validate our approach.
From: ICRA 2013
Link: http://covil.sdu.dk/publications/paper1099.pdf
Monday, August 19, 2013
Lab Meeting, August 22, 2013 (Yen-Ting): Joint Optimization for Object Class Segmentation and Dense Stereo Reconstruction
Title: Joint Optimization for Object Class Segmentation and Dense Stereo Reconstruction
Authors: Lubor Ladický, Paul Sturgess, Chris Russell, Sunando Sengupta, Yalin Bastanlar, William Clocksin and Philip H.S. Torr
Abstract: The problems of dense stereo reconstruction and object class segmentation can both be formulated as Random Field labeling problems, in which every pixel in the image is assigned a label corresponding to either its disparity, or an object class such as road or building. While these two problems are mutually informative, no attempt has been made to jointly optimize their labelings. In this work we provide a flexible framework configured via cross-validation that unifies the two problems and demonstrate that, by resolving ambiguities, which would be present in real world data if the two problems were considered separately, joint optimization of the two problems substantially improves performance. To evaluate our method, we augment the Leuven data set (http://cms.brookes.ac.uk/research/visiongroup/files/Leuven.zip), which is a stereo video shot from a car driving around the streets of Leuven, with 70 hand labeled object class and disparity maps. We hope that the release of these annotations will stimulate further work in the challenging domain of street view analysis. Complete source code is publicly available (http://cms.brookes.ac.uk/staff/Philip-Torr/ale.htm).
From: International Journal of Computer Vision (IJCV), 2012
Link: Click here
Authors: Lubor Ladický, Paul Sturgess, Chris Russell, Sunando Sengupta, Yalin Bastanlar, William Clocksin and Philip H.S. Torr
Abstract: The problems of dense stereo reconstruction and object class segmentation can both be formulated as Random Field labeling problems, in which every pixel in the image is assigned a label corresponding to either its disparity, or an object class such as road or building. While these two problems are mutually informative, no attempt has been made to jointly optimize their labelings. In this work we provide a flexible framework configured via cross-validation that unifies the two problems and demonstrate that, by resolving ambiguities, which would be present in real world data if the two problems were considered separately, joint optimization of the two problems substantially improves performance. To evaluate our method, we augment the Leuven data set (http://cms.brookes.ac.uk/research/visiongroup/files/Leuven.zip), which is a stereo video shot from a car driving around the streets of Leuven, with 70 hand labeled object class and disparity maps. We hope that the release of these annotations will stimulate further work in the challenging domain of street view analysis. Complete source code is publicly available (http://cms.brookes.ac.uk/staff/Philip-Torr/ale.htm).
From: International Journal of Computer Vision (IJCV), 2012
Link: Click here
Wednesday, August 07, 2013
Lab Meeting, August 8, 2013 (Channing): Multi-Robot System for Artistic Pattern Formation
Title: Multi-Robot System for Artistic Pattern Formation (ICRA 2011)
Authors: Javier Alonso-Mora, Andreas Breitenmoser, Martin Rufli, Roland Siegwart and Paul Beardsley
Abstract: This paper describes work on multi-robot pattern formation. Arbitrary target patterns are represented with an optimal robot deployment, using a method that is independent of the number of robots. Furthermore, the trajectories are visually appealing in the sense of being smooth, oscillation free, and showing fast convergence. A distributed controller guarantees collision free trajectories while taking into account the kinematics of differentially driven robots. Experimental results are provided for a representative set of patterns, for a swarm of up to ten physical robots, and for fifty virtual robots in simulation.
Paper Link: click here.
Authors: Javier Alonso-Mora, Andreas Breitenmoser, Martin Rufli, Roland Siegwart and Paul Beardsley
Abstract: This paper describes work on multi-robot pattern formation. Arbitrary target patterns are represented with an optimal robot deployment, using a method that is independent of the number of robots. Furthermore, the trajectories are visually appealing in the sense of being smooth, oscillation free, and showing fast convergence. A distributed controller guarantees collision free trajectories while taking into account the kinematics of differentially driven robots. Experimental results are provided for a representative set of patterns, for a swarm of up to ten physical robots, and for fifty virtual robots in simulation.
Paper Link: click here.
Wednesday, July 24, 2013
Lab meeting July 25th 2013 (Benny): Learning to segment and track in RGBD
Presented by: Benny
From: IEEE Transactions on Automation Science and Engineering 2013
Authors: Alex Teichman and Jake Lussier and Sebastian Thrun
Link: Paper
Abstract: We consider the problem of segmenting and tracking deformable objects in color video with depth (RGBD) data available from commodity sensors such as the Asus Xtion Pro Live or Microsoft Kinect. We frame this problem with very few assumptions - no prior object model, no stationary sensor, no prior 3D map - thus making a solution potentially useful for a large number of applications, including semi-supervised learning, 3D model capture, and object recognition.
Our approach makes use of a rich feature set, including local image appearance, depth discontinuities, optical flow, and surface normals to inform the segmentation decision in a conditional random field model. In contrast to previous work in this field, the proposed method learns how to best make use of these features from ground-truth segmented sequences. We provide qualitative and quantitative analyses which demonstrate substantial improvement over the state of the art.
This paper is an extended version of our previous work [29]. Building on this, we show that it is possible to achieve an order of magnitude speedup and thus real-time performance ( 20FPS) on a laptop computer by applying simple algorithmic optimizations to the original work. This speedup comes at only a minor cost in
overall accuracy and thus makes this approach applicable to a broader range of tasks. We demonstrate one such task: real-time, online, interactive segmentation to efficiently collect training data for an off-the-shelf object detector.
Tuesday, July 09, 2013
Lab Meeting July 10th, 2013 (Andi): Probabilistic Models for 3D Urban Scene Understanding from Movable Platforms
Title: Probabilistic Models for 3D Urban Scene Understanding from Movable Platforms
PhD Thesis, Andreas Geiger (KIT)Abstract:
Visual 3D scene understanding is an important component in autonomous driving and robot navigation. Intelligent vehicles for example often base their decisions on observations obtained from video cameras as they are cheap and easy to employ. Inner-city intersections represent an interesting but also very challenging scenario in this context: The road layout may be very complex and observations are often noisy or even missing due to heavy occlusions. While Highway navigation (e.g., Dickmanns et al. [49]) and autonomous driving on simple and annotated intersections (e.g., DARPA Urban Challenge [30]) have already been demonstrated successfully, understanding and navigating general inner-city crossings with little prior knowledge remains an unsolved problem. This thesis is a contribution to understanding multi-object traffic scenes from video sequences. All data is provided by a camera system which is mounted on top of the autonomous driving platform AnnieWAY [103]. The proposed probabilistic generative model reasons jointly about the 3D scene layout as well as the 3D location and orientation of objects in the scene. In particular, the scene topology, geometry as well as traffic activities are inferred from short video sequences. The model takes advantage of monocular information in the form of vehicle tracklets, vanishing lines and semantic labels. Additionally, the benefit of stereo features such as 3D scene flow and occupancy grids is investigated.
Motivated by the impressive driving capabilities of humans, no further information such as GPS, lidar, radar or map knowledge is required. Experiments conducted on 113 representative intersection sequences show that the developed approach successfully infers the correct layout in a variety of difficult scenarios. To evaluate the importance of each feature cue, experiments with different feature combinations are conducted. Additionally, the proposed method is shown to improve object detection and object orientation estimation performance.
based primarily on the following two papers: (CVPR + NIPS '11)
http://ttic.uchicago.edu/~rurtasun/publications/geiger_etal_cvpr11.pdf
http://machinelearning.wustl.edu/mlpapers/paper_files/NIPS2011_0842.pdf
Thursday, June 27, 2013
Lab Meeting July 3rd, 2013 (Jeff): Switchable Constraints vs. Max-Mixture Models vs. RRR - A Comparison of Three Approaches to Robust Pose Graph SLAM
Title: Switchable Constraints vs. Max-Mixture Models vs. RRR - A Comparison of Three Approaches to Robust Pose Graph SLAM
Authors: Niko Sünderhauf and Peter Protzel
Abstract:
SLAM algorithms that can infer a correct map despite the presence of outliers have recently attracted increasing attention. In the context of SLAM, outlier constraints are typically caused by a failed place recognition due to perceptional aliasing. If not handled correctly, they can have catastrophic effects on the inferred map. Since robust robotic mapping and SLAM are among the key requirements for autonomous long-term operation, inference methods that can cope with such data association failures are a hot topic in current research. Our paper compares three very recently published approaches to robust pose graph SLAM, namely switchable constraints, max-mixture models and the RRR algorithm. All three methods were developed as extensions to existing factor graph-based SLAM back-ends and aim at improving the overall system’s robustness to false positive loop closure constraints. Due to the novelty of the three proposed algorithms, no direct comparison has been conducted so far.
IEEE International Conference on Robotics and Automation (ICRA), 2013
Link:
LocalLink
http://www.tu-chemnitz.de/etit/proaut/rsrc/ICRA12-comparisonRobustSLAM.pdf
Reference Link:
Switchable Constraints
http://www.tu-chemnitz.de/etit/proaut/mitarbeiter/rsrc/IROS12-switchableConstraints.pdf
Max-Mixture
http://www.roboticsproceedings.org/rss08/p40.pdf
RRR
http://www.roboticsproceedings.org/rss08/p30.pdf
Authors: Niko Sünderhauf and Peter Protzel
Abstract:
SLAM algorithms that can infer a correct map despite the presence of outliers have recently attracted increasing attention. In the context of SLAM, outlier constraints are typically caused by a failed place recognition due to perceptional aliasing. If not handled correctly, they can have catastrophic effects on the inferred map. Since robust robotic mapping and SLAM are among the key requirements for autonomous long-term operation, inference methods that can cope with such data association failures are a hot topic in current research. Our paper compares three very recently published approaches to robust pose graph SLAM, namely switchable constraints, max-mixture models and the RRR algorithm. All three methods were developed as extensions to existing factor graph-based SLAM back-ends and aim at improving the overall system’s robustness to false positive loop closure constraints. Due to the novelty of the three proposed algorithms, no direct comparison has been conducted so far.
IEEE International Conference on Robotics and Automation (ICRA), 2013
Link:
LocalLink
http://www.tu-chemnitz.de/etit/proaut/rsrc/ICRA12-comparisonRobustSLAM.pdf
Reference Link:
Switchable Constraints
http://www.tu-chemnitz.de/etit/proaut/mitarbeiter/rsrc/IROS12-switchableConstraints.pdf
Max-Mixture
http://www.roboticsproceedings.org/rss08/p40.pdf
RRR
http://www.roboticsproceedings.org/rss08/p30.pdf
Subscribe to:
Posts (Atom)