Abstract:
We address the problem of determining where a photo was taken by estimating a full 6-DOF-plus-intrincs camera pose with respect to a large geo-registered 3D point cloud, bringing together research on image localization, landmark recognition, and 3D pose estimation. Our method scales to datasets with hundreds of thousands of images and tens of millions of 3D points through the use of two new techniques: a co-occurrence prior for RANSAC and bidirectional matching of image features with 3D points. We evaluate our method on several large data sets, and show state-of-the-art results on landmark recognition as well as the ability to locate cameras to within meters, requiring only seconds per query.
Link
This Blog is maintained by the Robot Perception and Learning lab at CSIE, NTU, Taiwan. Our scientific interests are driven by the desire to build intelligent robots and computers, which are capable of servicing people more efficiently than equivalent manned systems in a wide variety of dynamic and unstructured environments.
Wednesday, November 19, 2014
Thursday, November 06, 2014
Lab Meeting November 7th, 2014 (Jeff): Multiple Target Tracking using Recursive RANSAC
Title: Multiple Target Tracking using Recursive RANSAC
Authors: Peter C. Niedfeldt and Randal W. Beard
Abstract:
Estimating the states of multiple dynamic targets is difficult due to noisy and spurious measurements, missed detections, and the interaction between multiple maneuvering targets. In this paper a novel algorithm, which we call the recursive random sample consensus (R-RANSAC) algorithm, is presented to robustly estimate the states of an unknown number of dynamic targets. R-RANSAC was previously developed to estimate the parameters of multiple static signals when measurements are received sequentially in time. The R-RANSAC algorithm proposed in this paper reformulates our previous work to track dynamic targets using a Kalman filter. Simulation results using synthetic data are included to compare R-RANSAC to the GM-PHD filter.
American Control Conference (ACC), 2014
Link:
http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=6859273&tag=1
Authors: Peter C. Niedfeldt and Randal W. Beard
Abstract:
Estimating the states of multiple dynamic targets is difficult due to noisy and spurious measurements, missed detections, and the interaction between multiple maneuvering targets. In this paper a novel algorithm, which we call the recursive random sample consensus (R-RANSAC) algorithm, is presented to robustly estimate the states of an unknown number of dynamic targets. R-RANSAC was previously developed to estimate the parameters of multiple static signals when measurements are received sequentially in time. The R-RANSAC algorithm proposed in this paper reformulates our previous work to track dynamic targets using a Kalman filter. Simulation results using synthetic data are included to compare R-RANSAC to the GM-PHD filter.
American Control Conference (ACC), 2014
Link:
http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=6859273&tag=1
Monday, October 20, 2014
Lab Meeting, October 23, 2014, Jim
I will present my previous work about imitation learning. What we have done and learned will be described. Then I will present the proposed idea for solving the remaining issues of my previous work.
Wednesday, October 15, 2014
Lab Meeting, October 16, 2014 (Channing): Modeling and Learning Synergy for Team Formation with Heterogeneous Agents
Title:
Modeling and Learning Synergy for Team Formation with Heterogeneous Agents
AAMAS '12 Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems - Volume 1 Pages 365-374
Authors:
Modeling and Learning Synergy for Team Formation with Heterogeneous Agents
AAMAS '12 Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems - Volume 1 Pages 365-374
Somchaya Liemhetcharat and Manuela Veloso
Abstract:
The performance of a team at a task depends critically on the composition of its members. There is a notion of synergy in human teams that represents how well teams work together, and we are interested in modeling synergy in multi-agent teams. We focus on the problem of team formation, i.e., selecting a subset of a group of agents in order to perform a task, where each agent has its own capabilities, and the performance of a team of agents depends on the individual agent capabilities as well as the synergistic effects among the agents. We formally define synergy and how it can be computed using a synergy graph, where the distance between two agents in the graph correlates with how well they work together. We contribute a learning algorithm that learns a synergy graph from observations of the performance of subsets of the agents, and show that our learning algorithm is capable of learning good synergy graphs without prior knowledge of the interactions of the agents or their capabilities. We also contribute an algorithm to solve the team formation problem using the learned synergy graph, and experimentally show that the team formed by our algorithm outperforms a competing algorithm.
The performance of a team at a task depends critically on the composition of its members. There is a notion of synergy in human teams that represents how well teams work together, and we are interested in modeling synergy in multi-agent teams. We focus on the problem of team formation, i.e., selecting a subset of a group of agents in order to perform a task, where each agent has its own capabilities, and the performance of a team of agents depends on the individual agent capabilities as well as the synergistic effects among the agents. We formally define synergy and how it can be computed using a synergy graph, where the distance between two agents in the graph correlates with how well they work together. We contribute a learning algorithm that learns a synergy graph from observations of the performance of subsets of the agents, and show that our learning algorithm is capable of learning good synergy graphs without prior knowledge of the interactions of the agents or their capabilities. We also contribute an algorithm to solve the team formation problem using the learned synergy graph, and experimentally show that the team formed by our algorithm outperforms a competing algorithm.
Wednesday, October 01, 2014
Lab Meeting, October 2, 2014(Yun-Jun Shen): Multi-modal and Multi-spectral Registration for Natural Images
Title: Multi-modal and Multi-spectral Registration for Natural Images
Authors: Xiaoyong Shen , Li Xu, Qi Zhang, and Jiaya Jia
Abstract:
Authors: Xiaoyong Shen , Li Xu, Qi Zhang, and Jiaya Jia
Abstract:
Images now come in different forms – color, near-infrared, depth, etc. – due to the development of special and powerful cameras in computer vision and computational photography. Their cross-modal correspondence establishment is however left behind. We address this challenging dense matching problem considering structure variation possibly existing in these image sets and introduce new model and solution. Our main contribution includes designing the descriptor named robust selective normalized cross correlation (RSNCC) to establish dense pixel correspondence in input images and proposing its mathematical parameterization to make optimization tractable. A computationally robust framework including global and local matching phases is also established. We build a multi-modal dataset including natural images with labeled sparse correspondence. Our method will benefit image and vision applications that require accurate image alignment.
In: Computer Vision–ECCV 2014
Link: http://www.cse.cuhk.edu.hk/leojia/projects/multimodal/papers/multispectral_registration.pdf
Tuesday, September 23, 2014
Lab Meeting September 25th, 2014(Bang-Cheng Wang): Strategies for Adjusting the ZMP Reference Trajectory for Maintaining Balance in Humanoid Walking
Title: Strategies for Adjusting the ZMP Reference Trajectory for Maintaining Balance in Humanoid Walking
Abstract:
The present paper addresses strategies of changing the reference trajectories of the future ZMP that are used for online repetitive walking pattern generation. Walking pattern generation operates with a cycle of 20 [ms], and the reference ZMP trajectory is adjusted according to the current actual motion status in order to maintain the current balance. Three different strategies are considered for adjusting the ZMP. The first strategy is to change the reference ZMP inside the sole area. The second strategy is to change the position of the next step, and the third strategy is to change the duration of the current step. The manner in which these changes affect the current balance and how to combine the three strategies are discussed. The proposed methods are implemented as part of an online walking control system with short cycle pattern generation and are evaluated using the HRP-2 full-sized humanoid robot.
2010 IEEE International Conference on Robotics and Automation
http://ieeexplore.ieee.org/xpl/abstractMultimedia.jsp?arnumber=5510002
Authors: Koichi Nishiwaki and Satoshi Kagami
Abstract:
The present paper addresses strategies of changing the reference trajectories of the future ZMP that are used for online repetitive walking pattern generation. Walking pattern generation operates with a cycle of 20 [ms], and the reference ZMP trajectory is adjusted according to the current actual motion status in order to maintain the current balance. Three different strategies are considered for adjusting the ZMP. The first strategy is to change the reference ZMP inside the sole area. The second strategy is to change the position of the next step, and the third strategy is to change the duration of the current step. The manner in which these changes affect the current balance and how to combine the three strategies are discussed. The proposed methods are implemented as part of an online walking control system with short cycle pattern generation and are evaluated using the HRP-2 full-sized humanoid robot.
2010 IEEE International Conference on Robotics and Automation
http://ieeexplore.ieee.org/xpl/abstractMultimedia.jsp?arnumber=5510002
Wednesday, September 17, 2014
Lab Meeting September 18th, 2014(Gene): A Survey on Clustering Algorithms for Wireless Sensor Networks
Title: A Survey on Clustering Algorithms for Wireless Sensor Networks
Authors: Boyinbode, Olutayo, Hanh Le, and Makoto Takizawa
Authors: Boyinbode, Olutayo, Hanh Le, and Makoto Takizawa
Abstract:
A wireless sensor network (WSN) consisting of a large number of tiny sensors can be an effective tool for gathering data in diverse kinds of environments. The data collected by each sensor is communicated to the base station, which forwards the data to the end user. Clustering is introduced to WSNs because it hasproven to be an effective approach to provide better data aggregation and scalability for large WSNs. Clustering also conserves the limited energy resources of the sensors. This paper synthesises existing clustering algorithms in WSNs and highlights the challenges in clustering.
A wireless sensor network (WSN) consisting of a large number of tiny sensors can be an effective tool for gathering data in diverse kinds of environments. The data collected by each sensor is communicated to the base station, which forwards the data to the end user. Clustering is introduced to WSNs because it hasproven to be an effective approach to provide better data aggregation and scalability for large WSNs. Clustering also conserves the limited energy resources of the sensors. This paper synthesises existing clustering algorithms in WSNs and highlights the challenges in clustering.
2010 13th International Conference on Network-Based Information Systems
Wednesday, September 10, 2014
Lab Meeting September 11th, 2014(Zhi-Qiang): DeepFlow Large displacement optical flow with deep matching
Title: DeepFlow Large displacement optical flow with deep matching
Authors: Philippe Weinzaepfel, Jerome Revaud, Zaid Harchaoui, Cordelia Schmid
Abstract:
Optical flow
computation is a key component in many computer vision systems designed
for tasks such as action detection or activity recognition. However,
despite several major advances over the last decade, handling large displacement in optical flow remains an open problem. Inspired by the large displacement optical flow of Brox and Malik, our approach, termed Deep Flow, blends a matching algorithm with a variational approach for optical flow. We propose a descriptor matching algorithm, tailored to the optical flow problem, that allows to boost performance on fast motions. The matching
algorithm builds upon a multi-stage architecture with 6 layers,
interleaving convolutions and max-pooling, a construction akin to deep
convolutional nets. Using dense sampling, it allows to efficiently
retrieve quasi-dense correspondences, and enjoys a built-in smoothing
effect on descriptors matches, a valuable asset for integration into an energy minimization framework for optical flow estimation. Deep Flow efficiently handles large displacements occurring in realistic videos, and shows competitive performance on optical flow benchmarks. Furthermore, it sets a new state-of-the-art on the MPI-Sintel dataset.
Computer Vision (ICCV), 2013 IEEE International Conference on
Tuesday, August 26, 2014
Lab Meeting August 28th, 2014(Hung Chih Lu): Dynamic Scene Deblurring
Title: Dynamic Scene Deblurring
Authors: Tae Hyun Kim, Byeongjoo Ahn, and Kyoung Mu Lee
Abstract:
Most conventional single image deblurring methods assume that the underlying scene is static and the blur is
caused by only camera shake. In this paper, in contrast to this restrictive assumption, we address the deblurring problem of general dynamic scenes which contain multiple moving objects as well as camera shake. In case of dynamic scenes, moving objects and background have different blur motions, so the segmentation of the motion blur is required for deblurring each distinct blur motion accurately.
Thus, we propose a novel energy model designed with the weighted sum of multiple blur data models, which
estimates different motion blurs and their associated pixelwise weights, and resulting sharp image. In this framework, the local weights are determined adaptively and get high values when the corresponding data models have high data fidelity. And, the weight information is used for the segmentation
of the motion blur. Non-local regularization of weights are also incorporated to produce more reliable segmentation results. A convex optimization-based method is used for the solution of the proposed energy model. Experimental results demonstrate that our method outperforms conventional approaches in deblurring both dynamic scenes and static scenes.
ICCV 2013
Link: http://personal.ie.cuhk.edu.hk/~ccloy/files/iccv_2013_synopsis.pdf
Authors: Tae Hyun Kim, Byeongjoo Ahn, and Kyoung Mu Lee
Abstract:
Most conventional single image deblurring methods assume that the underlying scene is static and the blur is
caused by only camera shake. In this paper, in contrast to this restrictive assumption, we address the deblurring problem of general dynamic scenes which contain multiple moving objects as well as camera shake. In case of dynamic scenes, moving objects and background have different blur motions, so the segmentation of the motion blur is required for deblurring each distinct blur motion accurately.
Thus, we propose a novel energy model designed with the weighted sum of multiple blur data models, which
estimates different motion blurs and their associated pixelwise weights, and resulting sharp image. In this framework, the local weights are determined adaptively and get high values when the corresponding data models have high data fidelity. And, the weight information is used for the segmentation
of the motion blur. Non-local regularization of weights are also incorporated to produce more reliable segmentation results. A convex optimization-based method is used for the solution of the proposed energy model. Experimental results demonstrate that our method outperforms conventional approaches in deblurring both dynamic scenes and static scenes.
ICCV 2013
Link: http://personal.ie.cuhk.edu.hk/~ccloy/files/iccv_2013_synopsis.pdf
Wednesday, August 20, 2014
Lab Meeting August 21th, 2014(Henry): Model Globally, Match Locally: Efficient and Robust 3D Object Recognition
Title: Model Globally, Match Locally: Efficient and Robust 3D Object Recognition
Authors: Bertram Drost1, Markus Ulrich, Nassir Navab, Slobodan Ilic
Abstract:
This paper addresses the problem of recognizing free-form 3D objects in point clouds. Compared to traditional approaches based on point descriptors, which depend on local information around points, we propose a novel method that creates a global model description based on oriented point pair features and matches that model locally using a fast voting scheme. The global model description consists of all model point pair features and represents a mapping from the point pair feature space to the model, where similar features on the model are grouped together. Such representation allows using much sparser object and scene point clouds, resulting in very fast performance. Recognition is done locally using an efficient voting scheme on a reduced two-dimensional search space. We demonstrate the efficiency of our approach and show its high recognition performance in the case of noise, clutter and partial occlusions. Compared to state of the art approaches we achieve better recognition rates, and demonstrate that with a slight or even no sacrifice of the recognition performance our method is much faster then the current state of the art approaches.
Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on
Wednesday, July 23, 2014
Lab Meeting July 24th, 2014 (Jeff): Combining 3D Shape, Color, and Motion for Robust Anytime Tracking
Title: Combining 3D Shape, Color, and Motion for Robust Anytime Tracking
Authors: David Held, Jesse Levinson, Sebastian Thrun, and Silvio Savarese
Abstract:
Although object tracking has been studied for decades, real-time tracking algorithms often suffer from low accuracy and poor robustness when confronted with difficult, real world data. We present a tracker that combines 3D shape, color(when available), and motion cues to accurately track moving objects in real-time. Our tracker allocates computational effort based on the shape of the posterior distribution. Starting with a coarse approximation to the posterior, the tracker successively refines this distribution, increasing in tracking accuracy over time. The tracker can thus be run for any amount of time, after which the current approximation to the posterior is returned. Even at a minimum runtime of 0.7 milliseconds, our method outperforms all of the baseline methods of similar speed by at least 10%. If our tracker is allowed to run for longer, the accuracy continues to improve, and it continues to outperform all baseline methods. Our tracker is thus anytime, allowing the speed or accuracy to be optimized based on the needs of the application.
Robotics: Science and Systems (RSS), 2014
Link:
http://www.roboticsproceedings.org/rss10/p14.html
http://www.roboticsproceedings.org/rss10/p14.pdf
Authors: David Held, Jesse Levinson, Sebastian Thrun, and Silvio Savarese
Abstract:
Although object tracking has been studied for decades, real-time tracking algorithms often suffer from low accuracy and poor robustness when confronted with difficult, real world data. We present a tracker that combines 3D shape, color(when available), and motion cues to accurately track moving objects in real-time. Our tracker allocates computational effort based on the shape of the posterior distribution. Starting with a coarse approximation to the posterior, the tracker successively refines this distribution, increasing in tracking accuracy over time. The tracker can thus be run for any amount of time, after which the current approximation to the posterior is returned. Even at a minimum runtime of 0.7 milliseconds, our method outperforms all of the baseline methods of similar speed by at least 10%. If our tracker is allowed to run for longer, the accuracy continues to improve, and it continues to outperform all baseline methods. Our tracker is thus anytime, allowing the speed or accuracy to be optimized based on the needs of the application.
Robotics: Science and Systems (RSS), 2014
Link:
http://www.roboticsproceedings.org/rss10/p14.html
http://www.roboticsproceedings.org/rss10/p14.pdf
Wednesday, June 25, 2014
Lab Meeting, Jun 25, 2014 (Jimmy): Belief Propagation Based Localization and Mapping Using Sparsely Sampled GNSS SNR Measurements
Title: Belief Propagation Based Localization and Mapping Using Sparsely Sampled GNSS SNR Measurements
In: ICRA 2014
Authors: Andrew T. Irish, Jason T. Isaacs, Francois Quitin, Joao P. Hespanha, and Upamanyu Madhow
Abstract
A novel approach is proposed to achieve simultaneous localization and mapping (SLAM) based on the signal-tonoise ratio (SNR) of global navigation satellite system (GNSS) signals. It is assumed that the environment is unknown and that the receiver location measurements (provided by a GNSS receiver) are noisy. The 3D environment map is decomposed into a grid of binary-state cells (occupancy grid) and the receiver locations are approximated by sets of particles. Using a large number of sparsely sampled GNSS SNR measurements and receiver/satellite coordinates (all available from off-the-shelf GNSS receivers), likelihoods of blockage are associated with every receiver-to-satellite beam. The posterior distribution of the map and poses is shown to represent a factor graph, on which Loopy Belief Propagation is used to efficiently estimate the probabilities of each cell being occupied or empty, along with the probability of the particles for each receiver location. Experimental results demonstrate our algorithm’s ability to coarsely map (in three dimensions) a corner of a university campus, while also correcting for uncertainties in the location of the GNSS receiver.
Link
In: ICRA 2014
Authors: Andrew T. Irish, Jason T. Isaacs, Francois Quitin, Joao P. Hespanha, and Upamanyu Madhow
Abstract
A novel approach is proposed to achieve simultaneous localization and mapping (SLAM) based on the signal-tonoise ratio (SNR) of global navigation satellite system (GNSS) signals. It is assumed that the environment is unknown and that the receiver location measurements (provided by a GNSS receiver) are noisy. The 3D environment map is decomposed into a grid of binary-state cells (occupancy grid) and the receiver locations are approximated by sets of particles. Using a large number of sparsely sampled GNSS SNR measurements and receiver/satellite coordinates (all available from off-the-shelf GNSS receivers), likelihoods of blockage are associated with every receiver-to-satellite beam. The posterior distribution of the map and poses is shown to represent a factor graph, on which Loopy Belief Propagation is used to efficiently estimate the probabilities of each cell being occupied or empty, along with the probability of the particles for each receiver location. Experimental results demonstrate our algorithm’s ability to coarsely map (in three dimensions) a corner of a university campus, while also correcting for uncertainties in the location of the GNSS receiver.
Link
Tuesday, June 17, 2014
Lab Meeting, Jun 19, 2014 (Jim): Bayesian Exploration and Interactive Demonstration in Continuous State MAXQ-Learning
Title:
Bayesian Exploration and Interactive Demonstration in Continuous State MAXQ-Learning
IEEE International Conference on Robotics and Automation, May, 2014.
Author:
Kathrin Gräve and Sven Behnke
Abstract:
... Inspired by the way humans decompose complex tasks, hierarchical methods for robot learning have attracted significant interest. In this paper, we apply the MAXQ method for hierarchical reinforcement learning to continuous state spaces. By using Gaussian Process Regression for MAXQ value function decomposition, we obtain probabilistic estimates of primitive and completion values for every subtask within the MAXQ hierarchy. ... Based on the expected deviation of these estimates, we devise a Bayesian exploration strategy that balances optimization of expected values and risk from exploring unknown actions. To further reduce risk and to accelerate learning, we complement MAXQ with learning from demonstrations in an interactive way. In every situation and subtask, the system may ask for a demonstration if there is not enough knowledge available to determine a safe action for exploration. We demonstrate the ability of the proposed system to efficiently learn solutions to complex tasks on a box stacking scenario.
Link
Bayesian Exploration and Interactive Demonstration in Continuous State MAXQ-Learning
IEEE International Conference on Robotics and Automation, May, 2014.
Author:
Kathrin Gräve and Sven Behnke
Abstract:
... Inspired by the way humans decompose complex tasks, hierarchical methods for robot learning have attracted significant interest. In this paper, we apply the MAXQ method for hierarchical reinforcement learning to continuous state spaces. By using Gaussian Process Regression for MAXQ value function decomposition, we obtain probabilistic estimates of primitive and completion values for every subtask within the MAXQ hierarchy. ... Based on the expected deviation of these estimates, we devise a Bayesian exploration strategy that balances optimization of expected values and risk from exploring unknown actions. To further reduce risk and to accelerate learning, we complement MAXQ with learning from demonstrations in an interactive way. In every situation and subtask, the system may ask for a demonstration if there is not enough knowledge available to determine a safe action for exploration. We demonstrate the ability of the proposed system to efficiently learn solutions to complex tasks on a box stacking scenario.
Link
Wednesday, June 11, 2014
Lab Meeting, Jun 12, 2014 (Zhi-qiang): Simon Hadfield, Member, IEEE; Richard Bowden, Senior Member, IEEE. "Scene Particles: Unregularized Particle Based Scene Flow Estimation" IEEE TRANSACTIONS PATTERN ANALYSIS AND MACHINE INTELLIGENCE (PAMI), 2014
Title:
Scene Particles: Unregularized Particle Based Scene Flow Estimation
Author:
Simon Hadfield; Richard Bowden
IEEE Pattern Analysis and Machine Intelligence (PAMI), 2014
Link: http://personal.ee.surrey.ac.uk/Personal/S.Hadfield/papers/Scene%20particles.pdf
Scene Particles: Unregularized Particle Based Scene Flow Estimation
Author:
Simon Hadfield; Richard Bowden
Abstract
In this paper, an
algorithm is presented for estimating scene flow, which is a richer, 3D
analogue of Optical Flow. The approach operates orders of magnitude
faster than alternative techniques, and is well suited to further
performance gains through parallelized implementation. The algorithm
employs multiple hypothesis to deal with motion ambiguities, rather than
the traditional smoothness constraints, removing oversmoothing errors
and providing significant performance improvements on benchmark data,
over the previous state of the art. The approach is flexible, and
capable of operating with any combination of appearance and/or depth
sensors, in any setup, simultaneously estimating the structure and
motion if necessary. Additionally, the algorithm propagates information
over time to resolve ambiguities, rather than performing an isolated
estimation at each frame, as in contemporary approaches. Approaches to
smoothing the motion field without sacrificing the benefits of multiple
hypotheses are explored, and a probabilistic approach to Occlusion
estimation is demonstrated, leading to 10% and 15% improved performance
respectively. Finally, a data driven tracking approach is described, and
used to estimate the 3D trajectories of hands during sign language,
without the need to model complex appearance variations at each
viewpoint.
From:
From:
Link: http://personal.ee.surrey.ac.uk/Personal/S.Hadfield/papers/Scene%20particles.pdf
Wednesday, June 04, 2014
Lab meeting Jun 5, 2014 (Hung-Chih Lu): Robust Online Multi-Object Tracking based on Tracklet Confidence and Online Discriminative Appearance Learning
Title: Robust Online Multi-Object Tracking based on Tracklet Confidence and Online Discriminative Appearance Learning
Authors: Seung-Hwan Bae and Kuk-Jin Yoon
Abstract:
Online multi-object tracking aims at producing complete tracks of multiple objects using the information accumulated up to the present moment. It still remains a difficult problem in complex scenes, because of frequent occlusion by clutter or other objects, similar appearances of different objects, and other factors. In this paper, we propose a robust online multi-object tracking method that can handle
these difficulties effectively.
We first propose the tracklet confidence using the detectability and continuity of a tracklet, and formulate a
multi-object tracking problem based on the tracklet confidence.The multi-object tracking problem is then solved by associating tracklets in different ways according to their confidence values. Based on this strategy, tracklets sequentially grow with online-provided detections, and fragmented tracklets are linked up with others without any iterative and expensive associations. Here, for reliable association between tracklets and detections, we also propose a novel online learning method using an incremental linear discriminant
analysis for discriminating the appearances of objects. By exploiting the proposed learning method, tracklet association can be successfully achieved even under severe occlusion. Experiments with challenging public datasets show distinct performance improvement over other batch and online tracking methods.
CVPR 2014
Link
Authors: Seung-Hwan Bae and Kuk-Jin Yoon
Abstract:
Online multi-object tracking aims at producing complete tracks of multiple objects using the information accumulated up to the present moment. It still remains a difficult problem in complex scenes, because of frequent occlusion by clutter or other objects, similar appearances of different objects, and other factors. In this paper, we propose a robust online multi-object tracking method that can handle
these difficulties effectively.
We first propose the tracklet confidence using the detectability and continuity of a tracklet, and formulate a
multi-object tracking problem based on the tracklet confidence.The multi-object tracking problem is then solved by associating tracklets in different ways according to their confidence values. Based on this strategy, tracklets sequentially grow with online-provided detections, and fragmented tracklets are linked up with others without any iterative and expensive associations. Here, for reliable association between tracklets and detections, we also propose a novel online learning method using an incremental linear discriminant
analysis for discriminating the appearances of objects. By exploiting the proposed learning method, tracklet association can be successfully achieved even under severe occlusion. Experiments with challenging public datasets show distinct performance improvement over other batch and online tracking methods.
CVPR 2014
Link
Wednesday, May 28, 2014
Lab meeting May 29, 2014 (Kung-Hung Lu): Finding Group Interactions in Social Clutter
Title: Finding Group Interactions in Social Clutter
Authors: Ruonan Li, Parker Porfilio, Todd Zickler
Abstract:
We consider the problem of finding distinctive social interactions involving groups of agents embedded in larger social gatherings. Given a pre-defined gallery of short exemplar interaction videos, and a long input video of a large gathering (with approximately-tracked agents), we identify within the gathering small sub-groups of agents exhibiting social interactions that resemble those in the exemplars. The participants of each detected group interaction are localized in space; the extent of their interaction is localized in time; and when the gallery of exemplars is annotated with group-interaction categories, each detected interaction is classified into one of the pre-defined categories. Our approach represents group behaviors by dichotomous collections
of descriptors for (a) individual actions, and (b) pairwise interactions; and it includes efficient algorithms for
optimally distinguishing participants from by-standers in every temporal unit and for temporally localizing the extent of the group interaction. Most importantly, the method is generic and can be applied whenever numerous interacting agents can be approximately tracked over time. We evaluate the approach using three different video collections, two that involve humans and one that involves mice.
In: Computer Vision and Pattern Recognition(CVPR), 2013 IEEE Conference on. IEEE, 2013
Link: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6619195
Wednesday, May 21, 2014
Lab meeting May 22, 2014 (Chun-Kai Chang): Communication Adaptive Multi-Robot Simultaneous Localization and Tracking via Hybrid Measurement and Belief Sharing
Title: Communication Adaptive Multi-Robot Simultaneous Localization and Tracking via Hybrid Measurement and Belief Sharing
Authors: Chun-Kai Chang, Chun-Hua Chang and Chieh-Chih WangAbstract:Existing multi-robot cooperative perception solutions can be mainly classified into two categories, measurement-based and belief-based, according to the information shared among robots. With well-controlled communication, measurement-based approaches are expected to achieve theoretically optimal estimates while belief-based approaches are not because the cross-correlations between beliefs are hard to be perfectly estimated in practice. Nevertheless, belief-based approaches perform relatively stable under unstable communication as a belief contains the information of multiple previous measurements. Motivated by the observation that measurement sharing and belief sharing are respectively superior in different conditions, in this paper a hybrid algorithm, communication adaptive multi-robot simultaneous localization and tracking (ComAd MR-SLAT), is proposed to combine the advantages of both. To tackle the unknown or unstable communication conditions, the information to share is decided by maximizing the expected uncertainty reduction online, based on which the algorithm dynamically alternates between measurement sharing and belief-sharing without information loss or reuse. The proposed ComAd MR-SLAT is evaluated in communication conditions with different packet loss rates and bursty loss lengths. In our experiments, ComAd MR-SLAT outperforms measurement-based and belief-based MR-SLAT in accuracy. The experimental results demonstrate the effectiveness of the proposed hybrid algorithm and exhibit that ComAd MR-SLAT is robust under different communication conditions.
In: IEEE International Conference on Robotics and Automation, 2014.
Wednesday, May 14, 2014
Lab meeting May 15, 2014 (Yun-Jun Shen): Robust Monocular Epipolar Flow Estimation
Title: Robust Monocular Epipolar Flow Estimation
Authors: Koichiro Yamaguchi, David McAllester and Raquel Urtasun
Abstract:
We consider the problem of computing optical flow in monocular video taken from a moving vehicle. In this setting, the vast majority of image flow is due to the vehicle’s ego-motion. We propose to take advantage of this fact and estimate flow along the epipolar lines of the egomotion. Towards this goal, we derive a slanted-plane MRF model which explicitly reasons about the ordering of planes and their physical validity at junctions. Furthermore, we present a bottom-up grouping algorithm which produces over-segmentations that respect flow boundaries. We demonstrate the effectiveness of our approach in the challenging KITTI flow benchmark achieving half the error of the best competing general flow algorithm and one third of the error of the best epipolar flow algorithm.
In: Computer Vision and Pattern Recognition (CVPR), 2013 IEEE Conference on. IEEE, 2013
Link: http://ttic.uchicago.edu/~rurtasun/publications/yamaguchi_et_al_cvpr13.pdf
Authors: Koichiro Yamaguchi, David McAllester and Raquel Urtasun
Abstract:
We consider the problem of computing optical flow in monocular video taken from a moving vehicle. In this setting, the vast majority of image flow is due to the vehicle’s ego-motion. We propose to take advantage of this fact and estimate flow along the epipolar lines of the egomotion. Towards this goal, we derive a slanted-plane MRF model which explicitly reasons about the ordering of planes and their physical validity at junctions. Furthermore, we present a bottom-up grouping algorithm which produces over-segmentations that respect flow boundaries. We demonstrate the effectiveness of our approach in the challenging KITTI flow benchmark achieving half the error of the best competing general flow algorithm and one third of the error of the best epipolar flow algorithm.
In: Computer Vision and Pattern Recognition (CVPR), 2013 IEEE Conference on. IEEE, 2013
Link: http://ttic.uchicago.edu/~rurtasun/publications/yamaguchi_et_al_cvpr13.pdf
Tuesday, May 06, 2014
Lab meeting May 8, 2014 (Bang-Cheng Wang): Robust feedback control of ZMP-based gait for the humanoid robot Nao
Authors: J.J. Alcaraz-Jiménez, D. Herrero-Pérez and H. Martínez-Barberá
Abstract:
Numerous approaches have been proposed to generate well-balanced gaits in biped robots that show excellent performance in simulated environments. However, in general, the dynamic balance of the robots decreases dramatically when these methods are tested in physical platforms. Since humanoid robots are intended to collaborate with humans and operate in everyday environments, it is of paramount importance to test such approaches both in physical platforms and under severe conditions. In this work, the special characteristics of the Nao humanoid platform are analyzed and a control system that allows robust walking and disturbance rejection is proposed. This approach combines the zero moment point (ZMP) stability criterion with angular momentum suppression and step timing control. The proposed method is especially suitable for platforms with limited computational resources and sensory and sensory-motor capabilities.
In: The International Journal of Robotics Research (IJRR) August/September 2013 vol. 32
Link: http://ijr.sagepub.com/content/32/9-10/1074
Video: http://www.ijrr.org/ijrr_2013/487566.htm
Abstract:
Numerous approaches have been proposed to generate well-balanced gaits in biped robots that show excellent performance in simulated environments. However, in general, the dynamic balance of the robots decreases dramatically when these methods are tested in physical platforms. Since humanoid robots are intended to collaborate with humans and operate in everyday environments, it is of paramount importance to test such approaches both in physical platforms and under severe conditions. In this work, the special characteristics of the Nao humanoid platform are analyzed and a control system that allows robust walking and disturbance rejection is proposed. This approach combines the zero moment point (ZMP) stability criterion with angular momentum suppression and step timing control. The proposed method is especially suitable for platforms with limited computational resources and sensory and sensory-motor capabilities.
In: The International Journal of Robotics Research (IJRR) August/September 2013 vol. 32
Link: http://ijr.sagepub.com/content/32/9-10/1074
Video: http://www.ijrr.org/ijrr_2013/487566.htm
Tuesday, April 15, 2014
Lab meeting April 17, 2014 (Yen-Ting): Dense correspondence and annotation systems
I will present several state-of-the-art annotation systems and their relationship with dense correspondences. I will try to compare them with my own work now.
Reference papers:
LabelMe video: Building a Video Database with Human Annotations (ICCV 2009)
link
Efficiently Scaling up Crowdsourced Video Annotation (IJCV 2013)
link
Human-Assisted Motion Annotation (CVPR 2008)
link
Annotation Propagation in Large Image Databases via Dense Image Correspondence (ECCV 2012)
link
Reference papers:
LabelMe video: Building a Video Database with Human Annotations (ICCV 2009)
link
Efficiently Scaling up Crowdsourced Video Annotation (IJCV 2013)
link
Human-Assisted Motion Annotation (CVPR 2008)
link
Annotation Propagation in Large Image Databases via Dense Image Correspondence (ECCV 2012)
link
Wednesday, April 09, 2014
Lab meeting April 10, 2014 (Channing): "CAPT: Concurrent assignment and planning of trajectories for multiple robots"
Title: CAPT: Concurrent assignment and planning of trajectories for multiple robots
Authors: Matthew Turpin, Nathan Michael, and Vijay Kumar
GRASP Laboratory, University of Pennsylvania, Philadelphia, USA
In: The International Journal of Robotics Research (IJRR), January 2014, 33: 98-112
Abstract:
In this paper, we consider the problem of concurrent assignment and planning of trajectories (which we denote CAPT) for a team of robots. This problem involves simultaneously addressing two challenges: (1) the combinatorially complex problem of finding a suitable assignment of robots to goal locations, and (2) the generation of collision-free, time parameterized trajectories for every robot. We consider the CAPT problem for unlabeled (interchangeable) robots and propose algorithmic solutions to two variations of the CAPT problem. The first algorithm, C-CAPT, is a provably correct, complete, centralized algorithm which guarantees collision-free optimal solutions to the CAPT problem in an obstacle-free environment. To achieve these strong claims, C-CAPT exploits the synergy obtained by combining the two subproblems of assignment and trajectory generation to provide computationally tractable solutions for large numbers of robots. We then propose a decentralized solution to the CAPT problem through d-CAPT, a decentralized algorithm that provides suboptimal results compared to C-CAPT . We illustrate the algorithms and resulting performance through simulation and experimentation.
Authors: Matthew Turpin, Nathan Michael, and Vijay Kumar
GRASP Laboratory, University of Pennsylvania, Philadelphia, USA
In: The International Journal of Robotics Research (IJRR), January 2014, 33: 98-112
Abstract:
In this paper, we consider the problem of concurrent assignment and planning of trajectories (which we denote CAPT) for a team of robots. This problem involves simultaneously addressing two challenges: (1) the combinatorially complex problem of finding a suitable assignment of robots to goal locations, and (2) the generation of collision-free, time parameterized trajectories for every robot. We consider the CAPT problem for unlabeled (interchangeable) robots and propose algorithmic solutions to two variations of the CAPT problem. The first algorithm, C-CAPT, is a provably correct, complete, centralized algorithm which guarantees collision-free optimal solutions to the CAPT problem in an obstacle-free environment. To achieve these strong claims, C-CAPT exploits the synergy obtained by combining the two subproblems of assignment and trajectory generation to provide computationally tractable solutions for large numbers of robots. We then propose a decentralized solution to the CAPT problem through d-CAPT, a decentralized algorithm that provides suboptimal results compared to C-CAPT . We illustrate the algorithms and resulting performance through simulation and experimentation.
Download link: http://ijr.sagepub.com/content/33/1/98.full.pdf
Related Media link: http://www.seas.upenn.edu/~mturpin/summary.html
Wednesday, March 19, 2014
Lab meeting Mar. 20, (Andi) Multiview Structure from Motion in Trajectory Space
Authors: Aamer Zaheer, Ijaz Akhter, Mohammad Haris Baig, Shabbir Marzban, Sohaib Khan
Abstract:
Most nonrigid objects exhibit temporal regularities in their deformations. Recently it was proposed that these regularities can be parameterized by assuming that the non-rigid structure lies in a small dimensional trajectory space. In this paper, we propose a factorization approach for 3D reconstruction from multiple static cameras under the compact trajectory subspace representation. Proposed factorization is analogous to rank-3 factorization of rigid structure from motion problem, in transformed space. The benefit of our approach is that the 3D trajectory basis can be directly learned from the image observations. This also allows us to impute missing observations and denoise tracking errors without explicit estimation of the 3D structure. In contrast to standard triangulation based methods which require points to be visible in at least two cameras, our approach can reconstruct points, which remain occluded even in all the cameras for quite a long time. This makes our solution especially suitable for occlusion handling in motion capture systems. We demonstrate robustness of our method on challenging real and synthetic scenarios.
In: Proceedings of the 13th International Conference on Computer Vision (ICCV), Barcelona, Spain, Nov 2011
download paper
Abstract:
Most nonrigid objects exhibit temporal regularities in their deformations. Recently it was proposed that these regularities can be parameterized by assuming that the non-rigid structure lies in a small dimensional trajectory space. In this paper, we propose a factorization approach for 3D reconstruction from multiple static cameras under the compact trajectory subspace representation. Proposed factorization is analogous to rank-3 factorization of rigid structure from motion problem, in transformed space. The benefit of our approach is that the 3D trajectory basis can be directly learned from the image observations. This also allows us to impute missing observations and denoise tracking errors without explicit estimation of the 3D structure. In contrast to standard triangulation based methods which require points to be visible in at least two cameras, our approach can reconstruct points, which remain occluded even in all the cameras for quite a long time. This makes our solution especially suitable for occlusion handling in motion capture systems. We demonstrate robustness of our method on challenging real and synthetic scenarios.
In: Proceedings of the 13th International Conference on Computer Vision (ICCV), Barcelona, Spain, Nov 2011
download paper
Wednesday, March 12, 2014
Lab meeting Mar. 13, (ChihChung) Matching two scene images with large distance and view angle change.
In this reporting, I will present the recent state-of-art approaches for scene image matching tasks and then discuss several new ideas of mine.
The references are:
Algorithms:
Affine-invariant SIFT:
link1
link2
link3
ORSA(Optimized RANSAC):
link
Virtual-line descriptor:
link
1-point RANSAC:
link
Implementation:
Using MAV and google street map for visual localization:
link
The references are:
Algorithms:
Affine-invariant SIFT:
link1
link2
link3
ORSA(Optimized RANSAC):
link
Virtual-line descriptor:
link
1-point RANSAC:
link
Implementation:
Using MAV and google street map for visual localization:
link
Monday, March 03, 2014
Lab Meeting March 6th, 2014 (Jeff): Simultaneous Parameter Calibration, Localization, and Mapping
Title: Simultaneous Parameter Calibration, Localization, and Mapping
Authors: Rainer Kümmerle, Giorgio Grisetti, and Wolfram Burgard
Abstract:
The calibration parameters of a mobile robot play a substantial role in navigation tasks. Often these parameters are subject to variations that depend either on changes in the environment or on the load of the robot. In this paper, we propose an approach to simultaneously estimate a map of the environment, the position of the on-board sensors of the robot, and its kinematic parameters. Our method requires no prior knowledge about the environment and relies only on a rough initial guess of the parameters of the platform. The proposed approach estimates the parameters online and it is able to adapt to non-stationary changes of the configuration. We tested our approach in simulated environments and on a wide range of real-world data using different types of robotic platforms.
Advanced Robotics Vol.26, 2012
Link:
http://www.tandfonline.com/doi/full/10.1080/01691864.2012.728694
Reference Link:
Simultaneous Parameter Calibration, Localization, and Mapping for Robust Service Robotics.
ARSO2011.
http://europa.informatik.uni-freiburg.de/files/kuemmerle11arso.pdf
Simultaneous Calibration, Localization, and Mapping.
IROS2011.
http://ais.informatik.uni-freiburg.de/publications/papers/kuemmerle11iros.pdf?origin=publication_detail
Authors: Rainer Kümmerle, Giorgio Grisetti, and Wolfram Burgard
Abstract:
The calibration parameters of a mobile robot play a substantial role in navigation tasks. Often these parameters are subject to variations that depend either on changes in the environment or on the load of the robot. In this paper, we propose an approach to simultaneously estimate a map of the environment, the position of the on-board sensors of the robot, and its kinematic parameters. Our method requires no prior knowledge about the environment and relies only on a rough initial guess of the parameters of the platform. The proposed approach estimates the parameters online and it is able to adapt to non-stationary changes of the configuration. We tested our approach in simulated environments and on a wide range of real-world data using different types of robotic platforms.
Advanced Robotics Vol.26, 2012
Link:
http://www.tandfonline.com/doi/full/10.1080/01691864.2012.728694
Reference Link:
Simultaneous Parameter Calibration, Localization, and Mapping for Robust Service Robotics.
ARSO2011.
http://europa.informatik.uni-freiburg.de/files/kuemmerle11arso.pdf
Simultaneous Calibration, Localization, and Mapping.
IROS2011.
http://ais.informatik.uni-freiburg.de/publications/papers/kuemmerle11iros.pdf?origin=publication_detail
Wednesday, February 26, 2014
Lab Meeting Feburary 27, 2014 (Jimmy): System-Level Performance Analysis for Bayesian Cooperative Positioning: From Global to Local
Title: System-Level Performance Analysis for Bayesian Cooperative Positioning: From Global to Local
Authors: Zhang Siwei, Ronald Raulefs, Armin Dammann, and Stephan Sand
In: IEEE International Conference on Indoor Positioning and Indoor Navigation 2013
Abstract
Cooperative positioning (CP) can be used either to calibrate the accumulated error from inertial navigation or as a stand-alone navigation system. Though intensive research has been conducted on CP, there is a need to further investigate the joint impact from the system level on the accuracy. We derive a posterior Cramer-Rao bound (PCRB) considering both the physical layer (PHY) signal structure and the asynchronous latency from the multiple access control layer (MAC). The PCRB shows an immediate relationship between the theoretical accuracy limit and the effective factors, e.g. geometry, node dynamic, latency, signal structure, power, etc. which is useful to assess a cooperative system. However, for a large-scale decentralized cooperation network, calculating the PCRB becomes difficult due to the high state dimension and the absence of global information. We propose an equivalent ranging variance (ERV) scheme which projects the neighbor's positioning uncertainty to the distance measurement inaccuracy. With this, the effect from the interaction among the mobile terminals (MTs), e.g. measurement and communication can be decoupled. We use the ERV to derive a local PCRB (L-PCRB) which approximates the PCRB locally at each MT with low complexity. We further propose combining the ERV and L-PCRB together to improve the precision of the Bayesian localization algorithms. Simulation with an L-PCRB-aided distributed particle filter (DPF) in two typical cooperative positioning scenarios show a significant improvement comparing with the non-cooperative or standard DPF.
[Link]
Authors: Zhang Siwei, Ronald Raulefs, Armin Dammann, and Stephan Sand
In: IEEE International Conference on Indoor Positioning and Indoor Navigation 2013
Abstract
Cooperative positioning (CP) can be used either to calibrate the accumulated error from inertial navigation or as a stand-alone navigation system. Though intensive research has been conducted on CP, there is a need to further investigate the joint impact from the system level on the accuracy. We derive a posterior Cramer-Rao bound (PCRB) considering both the physical layer (PHY) signal structure and the asynchronous latency from the multiple access control layer (MAC). The PCRB shows an immediate relationship between the theoretical accuracy limit and the effective factors, e.g. geometry, node dynamic, latency, signal structure, power, etc. which is useful to assess a cooperative system. However, for a large-scale decentralized cooperation network, calculating the PCRB becomes difficult due to the high state dimension and the absence of global information. We propose an equivalent ranging variance (ERV) scheme which projects the neighbor's positioning uncertainty to the distance measurement inaccuracy. With this, the effect from the interaction among the mobile terminals (MTs), e.g. measurement and communication can be decoupled. We use the ERV to derive a local PCRB (L-PCRB) which approximates the PCRB locally at each MT with low complexity. We further propose combining the ERV and L-PCRB together to improve the precision of the Bayesian localization algorithms. Simulation with an L-PCRB-aided distributed particle filter (DPF) in two typical cooperative positioning scenarios show a significant improvement comparing with the non-cooperative or standard DPF.
[Link]
Thursday, February 20, 2014
Lab Meeting, February 20, 2014 (Jim): Learning monocular reactive uav control in cluttered natural environments
Title: Learning monocular reactive uav control in cluttered natural environments
Authors:
Stephane Ross, Narek Melik-Barkhudarov, Kumar Shaurya Shankar, Andreas Wendel, Debadeepta Dey, J. Andrew (Drew) Bagnell, and Martial Hebert
IEEE International Conference on Robotics and Automation, March, 2013.
Abstract:
... Unlike large vehicles, MAVs can only carry very light sensors, such as cameras, making autonomous navigation through obstacles much more challenging. In this paper, we describe a system that navigates a small quadrotor helicopter autonomously at low altitude through natural forest environments. Using only a single cheap camera to perceive the environment, we are able to maintain a constant velocity of up to 1.5m/s. Given a small set of human pilot demonstrations, we use recent state-of-theart imitation learning techniques to train a controller that can avoid trees by adapting the MAVs heading. We demonstrate the performance of our system in a more controlled environment indoors, and in real natural forest environments outdoors.
Link
Authors:
Stephane Ross, Narek Melik-Barkhudarov, Kumar Shaurya Shankar, Andreas Wendel, Debadeepta Dey, J. Andrew (Drew) Bagnell, and Martial Hebert
IEEE International Conference on Robotics and Automation, March, 2013.
Abstract:
... Unlike large vehicles, MAVs can only carry very light sensors, such as cameras, making autonomous navigation through obstacles much more challenging. In this paper, we describe a system that navigates a small quadrotor helicopter autonomously at low altitude through natural forest environments. Using only a single cheap camera to perceive the environment, we are able to maintain a constant velocity of up to 1.5m/s. Given a small set of human pilot demonstrations, we use recent state-of-theart imitation learning techniques to train a controller that can avoid trees by adapting the MAVs heading. We demonstrate the performance of our system in a more controlled environment indoors, and in real natural forest environments outdoors.
Link
Tuesday, February 11, 2014
Lab Meeting, February 13, 2014(Hung-Chih Lu): Zhaoyin Jiay, Andrew Gallaghery, Ashutosh Saxena "3D-Based Reasoning with Blocks, Support, and Stability." CVPR 2013
Title:
3D-Based Reasoning with Blocks, Support, and Stability
Author:
Zhaoyin Jiay, Andrew Gallaghery, Ashutosh Saxena.
Abstract:
3D volumetric reasoning is important for truly understanding a scene. Humans are able to both segment each
object in an image, and perceive a rich 3D interpretation of the scene, e.g., the space an object occupies, which objects support other objects, and which objects would, if moved, cause other objects to fall. We propose a new approach for parsing RGB-D images using 3D block units for volumetric reasoning. The algorithm fits image segments with 3D blocks, and iteratively evaluates the scene based on block interaction properties. We produce a 3D representation of the scene based on jointly optimizing over segmentations,
block fitting, supporting relations, and object stability. Our algorithm incorporates the intuition that a good 3D representation of the scene is the one that fits the data well, and is a stable, self-supporting (i.e., one that does not topple) arrangement of objects. We experiment on several datasets including controlled and real indoor scenarios. Results show that our stability-reasoning framework improves RGB-D segmentation and scene volumetric representation.
From
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2013
Link
object in an image, and perceive a rich 3D interpretation of the scene, e.g., the space an object occupies, which objects support other objects, and which objects would, if moved, cause other objects to fall. We propose a new approach for parsing RGB-D images using 3D block units for volumetric reasoning. The algorithm fits image segments with 3D blocks, and iteratively evaluates the scene based on block interaction properties. We produce a 3D representation of the scene based on jointly optimizing over segmentations,
block fitting, supporting relations, and object stability. Our algorithm incorporates the intuition that a good 3D representation of the scene is the one that fits the data well, and is a stable, self-supporting (i.e., one that does not topple) arrangement of objects. We experiment on several datasets including controlled and real indoor scenarios. Results show that our stability-reasoning framework improves RGB-D segmentation and scene volumetric representation.
From
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2013
Link
Wednesday, January 22, 2014
Lab Meeting, January 23, 2014(Yun-Jun Shen):Delaitre, Vincent, et al. "Scene semantics from long-term observation of people." Computer Vision–ECCV 2012
Title:
Scene semantics from long-term observation of people
Author:
Delaitre, Vincent, David F. Fouhey, Ivan Laptev, Josef Sivic, Abhinav Gupta, and Alexei A. Efros.
Abstract:
Delaitre, Vincent, David F. Fouhey, Ivan Laptev, Josef Sivic, Abhinav Gupta, and Alexei A. Efros.
Abstract:
Our everyday objects support various tasks and can be used by people for different purposes. While object classification is a widely studied topic in computer vision, recognition of object function, i.e., what people can do with an object and how they do it, is rarely addressed. In this paper we construct a functional object description with the aim to recognize objects by the way people interact with them. We describe scene objects (sofas, tables, chairs) by associated human poses and ob- ject appearance. Our model is learned discriminatively from automatically estimated body poses in many realistic scenes. In particular, we make use of time-lapse videos from YouTube providing a rich source of common human-object interactions and minimizing the effort of manual object annotation. We show how the models learned from human observations significantly improve object recognition and enable prediction of characteristic human poses in new scenes. Results are shown on a dataset of more than 400,000 frames obtained from 146 time-lapse videos of challenging and realistic indoor scenes.
From:
12th European Conference on Computer Vision
Wednesday, January 15, 2014
Lab Meeting, January 16th, 2014 (Henry Lu): Jaeyong Sung, Colin Ponce, Bart Selman and Ashutosh Saxena. "Unstructured Human Activity Detection from RGBD Images" IEEE International Conference on Robotics and Automation (ICRA), 2012
Title:
Unstructured Human Activity Detection from RGBD Images
Authors:
Jaeyong Sung
Dept. of Comput. Sci., Cornell Univ., Ithaca, NY, USA Ponce, C. ; Selman, B. ; Saxena, A.
Abstracts:
Being able to detect and recognize human activities is essential for several applications, including personal assistive robotics. In this paper, we perform detection and recognition of unstructured humanactivity in unstructured environments. We use a RGBD sensor (Microsoft Kinect) as the input sensor, and compute a set of features based on human pose and motion, as well as based on image and point-cloud information. Our algorithm is based on a hierarchical maximum entropy Markov model (MEMM), which considers a person's activity as composed of a set of sub-activities. We infer the two-layered graph structure using a dynamic programming approach. We test our algorithm on detecting and recognizing twelve different activities performed by four people in different environments, such as a kitchen, a living room, an office, etc., and achieve good performance even when the person was not seen before in the training set.
From:
2012 IEEE International Conference on Robotics and Automation (ICRA)
Unstructured Human Activity Detection from RGBD Images
Authors:
Jaeyong Sung
Dept. of Comput. Sci., Cornell Univ., Ithaca, NY, USA Ponce, C. ; Selman, B. ; Saxena, A.
Abstracts:
Being able to detect and recognize human activities is essential for several applications, including personal assistive robotics. In this paper, we perform detection and recognition of unstructured humanactivity in unstructured environments. We use a RGBD sensor (Microsoft Kinect) as the input sensor, and compute a set of features based on human pose and motion, as well as based on image and point-cloud information. Our algorithm is based on a hierarchical maximum entropy Markov model (MEMM), which considers a person's activity as composed of a set of sub-activities. We infer the two-layered graph structure using a dynamic programming approach. We test our algorithm on detecting and recognizing twelve different activities performed by four people in different environments, such as a kitchen, a living room, an office, etc., and achieve good performance even when the person was not seen before in the training set.
From:
2012 IEEE International Conference on Robotics and Automation (ICRA)
Tuesday, January 07, 2014
Lab Meeting, January 9nd, 2014 (Zhi-qiang): Jiang Wang ; Zicheng Liu ; Ying Wu ; Junsong Yuan. "Mining actionlet ensemble for action recognition with depth cameras." IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2012
Title:
Mining actionlet ensemble for action recognition with depth cameras
Author:
Jiang Wang ; Zicheng Liu ; Ying Wu ; Junsong Yuan
Abstract:
Human action recognition is an important yet challenging task. The recently developed commodity depth sensors open up new possibilities of dealing with this problem but also present some unique challenges. The depth maps captured by the depth cameras are very noisy and the 3D positions of the tracked joints may be completely wrong if serious occlusions occur, which increases the intra-class variations in the actions. In this paper, an actionlet ensemble model is learnt to represent each action and to capture the intra-class variance. In addition, novel features that are suitable for depth data are proposed. They are robust to noise, invariant to translational and temporal misalignments, and capable of characterizing both the human motion and the human-object interactions. The proposed approach is evaluated on two challenging action recognition datasets captured by commodity depth cameras, and another dataset captured by a MoCap system. The experimental evaluations show that the proposed approach achieves superior performance to the state of the art algorithms.
From:
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2012
Link
Mining actionlet ensemble for action recognition with depth cameras
Author:
Jiang Wang ; Zicheng Liu ; Ying Wu ; Junsong Yuan
Abstract:
Human action recognition is an important yet challenging task. The recently developed commodity depth sensors open up new possibilities of dealing with this problem but also present some unique challenges. The depth maps captured by the depth cameras are very noisy and the 3D positions of the tracked joints may be completely wrong if serious occlusions occur, which increases the intra-class variations in the actions. In this paper, an actionlet ensemble model is learnt to represent each action and to capture the intra-class variance. In addition, novel features that are suitable for depth data are proposed. They are robust to noise, invariant to translational and temporal misalignments, and capable of characterizing both the human motion and the human-object interactions. The proposed approach is evaluated on two challenging action recognition datasets captured by commodity depth cameras, and another dataset captured by a MoCap system. The experimental evaluations show that the proposed approach achieves superior performance to the state of the art algorithms.
From:
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2012
Link