Authors: Manjunath Narayana, Allen Hanson, Erik Learned-Miller
Abstract
In moving camera videos, motion segmentation is com-monly performed using the image plane motion of pixels, or optical flow. However, objects that are at different depths from the camera can exhibit different optical flows even if they share the same real-world motion. This can cause a depth-dependent segmentation of the scene. Our goal is to develop a segmentation algorithm that clusters pixels that have similar real-world motion irrespective of their depth
in the scene. Our solution uses optical flow orientations in-stead of the complete vectors and exploits the well-known property that under camera translation, optical flow ori-entations are independent of object depth. We introduce a probabilistic model that automatically estimates the number of observed independent motions and results in a labeling that is consistent with real-world motion in the scene. The result of our system is that static objects are correctly identified as one segment, even if they are at different depths. Color features and information from previous frames in the video sequence are used to correct occasional errors due
to the orientation-based segmentation. We present results on more than thirty videos from different benchmarks. The system is particularly robust on complex background scenes containing objects at significantly different depths.
link
This Blog is maintained by the Robot Perception and Learning lab at CSIE, NTU, Taiwan. Our scientific interests are driven by the desire to build intelligent robots and computers, which are capable of servicing people more efficiently than equivalent manned systems in a wide variety of dynamic and unstructured environments.
Wednesday, November 27, 2013
Wednesday, November 20, 2013
Lab Meeting Nov. 21, (Channing) Where Do I Look Now? Gaze Allocation During Visually Guided Manipulation (ICRA 2012)
Title: Where Do I Look Now? Gaze Allocation During Visually Guided Manipulation (ICRA 2012)
Authors: Jose Nunez-Varela, B. Ravindran, Jeremy L.Wyatt
ABSTRACT - In this work we present principled methods for the coordination of a robot's oculomotor system with the rest of its body motor systems. The problem is to decide which physical actions to perform next and where the robot's gaze should be directed in order to gain information that is relevant to the success of its physical actions. Previous work on this problem has shown that a reward-based coordination mechanism provides an efficient solution. However, that approach does not allow the robot to move its gaze to different parts of the scene, it considers the robot to have only one motor system, and assumes that the actions have the same duration. The main contributions of our work are to extend that previous reward-based approach by making decisions about where to fixate the robot's gaze, handling multiple motor systems, and handling actions of variable duration. We compare our approach against two common baselines: random and round robin gaze allocation. We show how our method provides a more effective strategy to allocate gaze where it is needed the most.
Link
Authors: Jose Nunez-Varela, B. Ravindran, Jeremy L.Wyatt
ABSTRACT - In this work we present principled methods for the coordination of a robot's oculomotor system with the rest of its body motor systems. The problem is to decide which physical actions to perform next and where the robot's gaze should be directed in order to gain information that is relevant to the success of its physical actions. Previous work on this problem has shown that a reward-based coordination mechanism provides an efficient solution. However, that approach does not allow the robot to move its gaze to different parts of the scene, it considers the robot to have only one motor system, and assumes that the actions have the same duration. The main contributions of our work are to extend that previous reward-based approach by making decisions about where to fixate the robot's gaze, handling multiple motor systems, and handling actions of variable duration. We compare our approach against two common baselines: random and round robin gaze allocation. We show how our method provides a more effective strategy to allocate gaze where it is needed the most.
The Extension of the above work:
Title: Gaze Allocation Analysis for a Visually Guided Manipulation Task (SAB 2012)
Authors: Jose Nunez-Varela, B. Ravindran, Jeremy L.Wyatt
ABSTRACT - Findings from eye movement research in humans have demonstrated that the task determines where to look. One hypothesis is that the purpose of looking is to reduce uncertainty about properties relevant to the task. Following this hypothesis, we de ne a model that poses the problem of where to look as one of maximising task performance by reducing task relevant uncertainty. We implement and test our model on a simulated humanoid robot which has to move objects from a table into containers. Our model outperforms and is more robust than two other baseline schemes in terms of task performance whilst varying three environmental conditions, reach/grasp sensitivity, observation noise and the camera's field of view.
Thursday, November 14, 2013
Lab meeting Nov.14,(Benny) Detection- and Trajectory-Level Exclusion in Multiple Object Tracking (CVPR2013)
Authors: Anton Milan, Konrad Schindler, Stefan Roth
Abstract
When tracking multiple targets in crowded scenarios, modeling mutual exclusion between distinct targets becomes important at two levels: (1) in data association, each target observation should support at most one trajectory and each trajectory should be assigned at most one observation per frame; (2) in trajectory estimation, two trajectories should remain spatially separated at all times to avoid collisions. Yet, existing trackers often sidestep these important constraints. We address this using a mixed discrete-continuous conditional random field (CRF) that explicitly models both types of constraints: Exclusion between conflicting observations with supermodular pairwise terms, and exclusion between trajectories by generalizing global label costs to suppress the co-occurrence of incompatible labels (trajectories). We develop an expansion move-based MAP estimation scheme that handles both non-submodular constraints and pairwise global label costs. Furthermore, we perform a statistical analysis of ground-truth trajectories to derive appropriate CRF potentials for modeling data fidelity, target dynamics, and inter-target occlusion.
Abstract
When tracking multiple targets in crowded scenarios, modeling mutual exclusion between distinct targets becomes important at two levels: (1) in data association, each target observation should support at most one trajectory and each trajectory should be assigned at most one observation per frame; (2) in trajectory estimation, two trajectories should remain spatially separated at all times to avoid collisions. Yet, existing trackers often sidestep these important constraints. We address this using a mixed discrete-continuous conditional random field (CRF) that explicitly models both types of constraints: Exclusion between conflicting observations with supermodular pairwise terms, and exclusion between trajectories by generalizing global label costs to suppress the co-occurrence of incompatible labels (trajectories). We develop an expansion move-based MAP estimation scheme that handles both non-submodular constraints and pairwise global label costs. Furthermore, we perform a statistical analysis of ground-truth trajectories to derive appropriate CRF potentials for modeling data fidelity, target dynamics, and inter-target occlusion.
Wednesday, November 06, 2013
Lab meeting Nov.7, (ChihChung) MAV Urban Localization from Google Street View Data (IROS2013)
Authors: Andr´as L. Majdik, Yves Albers-Schoenberg, Davide Scaramuzza
Abstract—We tackle the problem of globally localizing a
camera-equipped micro aerial vehicle flying within urban environments
for which a Google Street View image database
exists. To avoid the caveats of current image-search algorithms
in case of severe viewpoint changes between the query and the
database images, we propose to generate virtual views of the
scene, which exploit the air-ground geometry of the system.
To limit the computational complexity of the algorithm, we
rely on a histogram-voting scheme to select the best putative
image correspondences. The proposed approach is tested on a
2km image dataset captured with a small quadroctopter flying
in the streets of Zurich. The success of our approach shows
that our new air-ground matching algorithm can robustly handle
extreme changes in viewpoint, illumination, perceptual aliasing,
and over-season variations, thus, outperforming conventional
visual place-recognition approaches.
[link]
Abstract—We tackle the problem of globally localizing a
camera-equipped micro aerial vehicle flying within urban environments
for which a Google Street View image database
exists. To avoid the caveats of current image-search algorithms
in case of severe viewpoint changes between the query and the
database images, we propose to generate virtual views of the
scene, which exploit the air-ground geometry of the system.
To limit the computational complexity of the algorithm, we
rely on a histogram-voting scheme to select the best putative
image correspondences. The proposed approach is tested on a
2km image dataset captured with a small quadroctopter flying
in the streets of Zurich. The success of our approach shows
that our new air-ground matching algorithm can robustly handle
extreme changes in viewpoint, illumination, perceptual aliasing,
and over-season variations, thus, outperforming conventional
visual place-recognition approaches.
[link]
Subscribe to:
Posts (Atom)