Thursday, June 28, 2007
The ideal candidate will have a broad understanding of machine learning techniques along with their implementation. In particular we are looking for a candidate with experience in statistical relational learning and/or reinforcement learning techniques. In addition to having a strong background in machine learning, a demonstrated ability to work within a group on large projects is desired. Experience with Linux and Java are a plus.
The ideal candidate will have a broad understanding of machine learning techniques along with their implementation. In particular we are looking for a candidate with experience in Markov Logic Networks, Inductive Logic Programming and/or other symbolic learning techniques. In addition to having a strong background in machine learning, a demonstrated ability to work within a group on large projects is desired. Experience with Linux, Java and Prolog are a plus.
Both candidates should have a PhD or a MS with 1-3 years of research experience. Please send you CV and a research statement to firstname.lastname@example.org or apply on the SRI website https://sri.ats.hrsmart.com/index.html. Look for postings 3525 and 3526.
We would like to fill a number of postdoc positions immediately in support of new research programs.
3D Video: This area of research investigates algorithms to process range and appearance data in real-time in order to produce dynamic, highly realistic computer graphics models. Present applications include remote operation of robotic vehicles and the production of virtual models of extended environments. The ideal applicant would have demonstrated strengths in computer graphics and computer vision and be experienced in software development as a member of a small team.
Scene Understanding: This area of research investigates algorithms to process range and appearance data in real-time in order to produce terrain classifications of a complex outdoor environment that will be used to help guide an autonomous unmanned ground vehicle. Range and appearance data is available from two perspectives: 1) from lidar and image sensors on-board the robot and 2) from above the robot in the form of satellite imagery and fly-over lidar data. The ideal applicant would have demonstrated strengths in computer vision, 3D lidar data, scene understanding and/or obstacle classification, be experienced in software development as a member of a large team, and work with little supervision.
Terrain Characterization: This area of research investigates algorithms that will make use of proprioception sensor data, in real-time in an on-line learning manner, to improve the performance of an autonomous unmanned ground vehicle. Proprioception data is a measure of what the robot "feels" or "senses" as it drives over the terrain. Thus it is a natural source of feedback into an on-line learning mechanism to improve terrain classification (i.e. vegetation or rock) / characterization (i.e. measure of slip) which were predicted on the terrain ahead of the robot before the robot drove over the terrain. The ideal applicant would have demonstrated strengths in machine learning, and computer vision, be experienced in software development as a member of a large team, and work with little supervision.
Applicants should possess both a solid preparation for performing research and a strong interest in useful realization of the technology. Please address all applications and inquiries to Alonzo Kelly (email@example.com) .
The area of research will be statistical mobility prediction for mobile robotic systems operating in natural terrain. The goal of the work will be to develop efficient, rigorous methods for predicting the ability of a robotic system to safely move through environments and surmount obstacles with poorly known physical characteristics. The research will be led by Dr. Karl Iagnemma and is sponsored by the U.S. Army Research Office.
The ideal candidate would possess strong analytical skills, and be familiar with stochastic simulation methods such as Monte Carlo (and similar) techniques. Fundamental knowledge of mobile robot kinematics and dynamics is required, as is expertise with common simulation (Matlab, Simulink) and programming (C, C++) tools. Familiarity with Bekker-type vehicle-terrain interaction models is desirable, but not required.
Applicants should submit 1) a CV, including a brief research statement, 2) 1-3 recent publications in electronic format, and 3) the names and contact information of three individuals who can serve as references. The total file size of applications should be < 3 MB. Only electronic applications will be considered.
Applicants should contact:
Karl Iagnemma, Ph.D.
Principal Research Scientist
Department of Mechanical Engineering
Massachusetts Institute of Technology
77 Massachusetts Ave., Room 3-435a
Cambridge, MA 02139 USA
Sunday, June 24, 2007
PleoWorld - [via] Link.
Friday, June 22, 2007
Thursday, June 21, 2007
Tuesday, June 19, 2007
Tuesday, June 19, 2007
By David Templeton, Pittsburgh Post-Gazette
Boss -- the robotic car that Carnegie Mellon University's Tartan Racing has developed -- cruised a test course like a 16-year-old with a driving permit, avoiding traffic, stopping at stop signs and swinging around cars parked in its lane.
See the full article.
Thursday, June 14, 2007
Enabling Lifelong Human-Robot Interaction
A Special Session of the
International Conference on Development and Learning
July 12, 2007
Imperial College London
What are the applications of robotics critical to society?
It is unclear how the capabilities of current and future robots will meet needs of their human users over the course of time. As robots move beyond laboratories into real-world human environments, human-robot interaction will be increasingly longitudinal. The performance of personal robots, similar to personal computers, will be subject to the dynamic expectations of human users and evaluated over of the span of years. Such long-term interaction poses distinct challenges for the scalability of autonomous robotic systems.
Scalability highlights the need for enhanced robot learning and development. Specifically, how can robots scale to perform unknown tasks across different environments and hardware platforms according to the preferences of individual users? To meet this challenge, we must revisit basic issues in developmental robotics about innate mechanisms and adapting behavior. Can innate robot capabilities be crafted to sufficiently encompass the space of relevant tasks, environments, and platforms? Can these innate capabilities be formalized mathematically and interfaced with human decision making? Should learning and adaptation be the central means for scalability? Can appropriate learning methods be performed tractably over large datasets online? How will users produce training data without programming or a prohibitive burden? How could innate mechanisms be structured to permit abstraction and generalization by learning? Are there common concepts used in existing approaches to these issues? (What learning methods and representations are needed to allow for assimilation of knowledge over extended periods of time and from different perceptual mechanisms?)
Further, scalability for lifelong HRI raises questions about how to evaluate across the uncontrolled factors. Is evaluation solely in a laboratory setting still sufficient? What combination of quantitative, qualitative, usability, and longitudinal aspects are needed for evaluation? Given the overhead for experimental infrastructure, how can we realize platforms that enable truly normalized evaluation across different algorithmic approaches? For longitudinal studies, could such robots be feasibly deployed to lay users? Is robotics at a point where normalized evaluation is realistic?
This special session of ICDL will address the issues facing lifelong HRI including but not limited to the questions raised above. We will assemble a diverse group of researchers in and beyond robotics to explore the convergence of theories about human development, human-machine interfaces, machine learning, and robot engineering towards developing scalable autonomous robots.
Sunday, June 10, 2007
Ali Farhadi, David Forsyth, and Ryan White
We build word models for American Sign Language (ASL) that transfer between different signers and different aspects. This is advantageous because one could use large amounts of labelled avatar data in combination with a smaller amount of labelled human data to spot a large number of words in human data. Transfer learning is possible because we represent blocks of video with novel intermediate discriminative features based on splits of the data. By constructing the same splits in avatar and human data and clustering appropriately, our features are both discriminative and semantically similar: across signers similar features imply similar words. We demonstrate transfer learning in two scenarios: from avatar to a frontally viewed human signer and from an avatar to human signer in a 3/4 view. (Project, PDF)
Adam O'Donovan, Ramani Duraiswami, and Jan Neumann
Combinations of microphones and cameras allow the joint audio visual sensing of a scene. Such arrangements of sensors are common in biological organisms and in applications such as meeting recording and surveillance where both modalities are necessary to provide scene understanding. Microphone arrays provide geometrical information on the source location, and allow the sound sources in the scene to be separated and the noise suppressed, while cameras allow the scene geometry and the location and motion of people and other objects to be estimated. In most previous work the fusion of the audio-visual information occurs at a relatively late stage. In contrast, we take the viewpoint that both cameras and microphone arrays are geometry sensors, and treat the microphone arrays as generalized cameras. We employ computer-vision inspired algorithms to treat the combined system of arrays and cameras. In particular, we consider the geometry introduced by a general microphone array and spherical microphone arrays. The latter show a geometry that is very close to central projection cameras, and we show how standard vision based calibration algorithms can be profitably applied to them. Experiments are presented that demonstrate the usefulness of the considered approach. (Slides, wmv, PDF)
Atul Kanaujia, Cristian Sminchisescu, and Dimitris Metaxas
Recent research in visual inference from monocular images has shown that discriminatively trained image-based predictors can provide fast, automatic qualitative 3D reconstructions of human body pose or scene structure in realworld environments. However, the stability of existing image representations tends to be perturbed by deformations and misalignments in the training set, which, in turn, degrade the quality of learning and generalization. In this paper we advocate the semi-supervised learning of hierarchical image descriptions in order to better tolerate variability at multiple levels of detail. We combine multilevel encodings with improved stability to geometric transformations,with metric learning and semi-supervised manifold regularization methods in order to further profile them for taskinvariance – resistance to background clutter and within the same human pose class variance. We quantitatively analyze the effectiveness of both descriptors and learning methods and show that each one can contribute, sometimes substantially, to more reliable 3D human pose estimates in cluttered images. PDF
Alexandru Balan, Leonid Sigal, Michael Black, James Davis, and Horst Haussecker
Much of the research on video-based human motion capture assumes the body shape is known a priori and is represented coarsely (e.g. using cylinders or superquadrics to model limbs). These body models stand in sharp contrast to the richly detailed 3D body models used by the graphics community. Here we propose a method for recovering such models directly from images. Specifically, we represent the body using a recently proposed triangulated mesh model called SCAPE which employs a low-dimensional, but detailed, parametric model of shape and pose-dependent deformations that is learned from a database of range scans of human bodies. Previous work showed that the parameters of the SCAPE model could be estimated from marker-based motion capture data. Here we go further to estimate the parameters directly from image data. We define a cost function between image observations and a hypothesized mesh and formulate the problem as optimization over the body shape and pose parameters using stochastic search. Our results show that such rich generative models enable the automatic recovery of detailed human shape and pose from images. pdf
Tongbo Chen, Hendrik Lensch, Christian Fuchs, and Hans-Peter Seidel
Translucent objects pose a difficult problem for traditional structured light 3D scanning techniques. Subsurface scattering corrupts the range estimation in two ways: by drastically reducing the signal-to-noise ratio and by shifting the intensity peak beneath the surface to a point which does not coincide with the point of incidence. In this paper we analyze and compare two descattering methods in order to obtain reliable 3D coordinates for translucent objects. By using polarization-difference imaging, subsurface scattering can be filtered out because multiple scattering randomizes the polarization direction of light while the surface reflectance partially keeps the polarization direction of the illumination. The descattered reflectance can be used for reliable 3D reconstruction using traditional optical 3D scanning techniques, such as structured light. Phase-shifting is another effective descattering technique if the frequency of the projected pattern is sufficiently high. We demonstrate the performance of these two techniques and the combination of them on scanning real-world translucent objects. PDF
Jianke Zhu and Michael R. Lyu
Detecting nonrigid surfaces is an interesting research problem for computer vision and image analysis. One important challenge of nonrigid surface detection is how to register a nonrigid surface mesh having a large number of free deformation parameters. This is particularly significant for detecting nonrigid surfaces from noisy observations. Nonrigid surface detection is usually regarded as a robust parameter estimation problem, which is typically solved iteratively from a good initialization in order to avoid local minima. In this paper, we propose a novel progressive finite Newton optimization scheme for the nonrigid surface detection problem, which is reduced to only solving a set of linear equations. The key of our approach is to formulate the nonrigid surface detection as an unconstrained quadratic optimization problem which has a closed-form solution for a given set of observations. Moreover, we employ a progressive active-set selection scheme, which takes advantage of the rank information of detected correspondences. We have conducted extensive experiments for performance evaluation on various environments, whose promising results show that the proposed algorithm is more efficient and effective than the existing iterative methods. PDF
Saturday, June 09, 2007
CVPR07 oral: Tracking in Low Frame Rate Video: A Cascade Particle Filter with Discriminative Observers of Different Lifespans
Yuan Li, Haizhou Ai, Takayoshi Yamashita, Shihong Lao, and Masato Kawade
Tracking object in low frame rate video or with abrupt motion poses two main difficulties which conventional tracking methods can barely handle:
1) poor motion continuity and increased search space;
2) fast appearance variation of target and more background clutter due to increased search space.
In this paper, we address the problem from a view which integrates conventional tracking and detection, and present a temporal probabilistic combination of discriminative observers of different lifespans. Each observer is learned from different ranges of samples, with different subsets of features, to achieve varying level of discriminative power at varying cost. An efficient fusion and temporal inference is then done by a cascade particle filter which consists of multiple stages of importance sampling. Experiments show significantly improved accuracy of the proposed approach in comparison with existing tracking methods, under the condition of low frame rate data and abrupt motion of both target and camera.
Thursday, June 07, 2007
Oncel Tuzel, Fatih Porikli, and Peter Meer
We present a new algorithm to detect humans in still images utilizing covariance matrices as object descriptors. Since these descriptors do not lie on a vector space, well known machine learning techniques are not adequate to learn the classifiers. The space of d-dimensional nonsingular covariance matrices can be represented as a connected Riemannian manifold. We present a novel approach for classifying points lying on a Riemannian manifold by incorporating the a priori information about the geometry of the space. The algorithm is tested on INRIA human database where superior detection rates are observed over the previous approaches. PDF
B. Leibe, N. Cornelis, K. Cornelis, L. Van Gool
In this paper, we present a system that integrates fully automatic scene geometry estimation, 2D object detection, 3D localization, trajectory estimation, and tracking for dynamic scene interpretation from a moving vehicle. Our sole input are two video streams from a calibrated stereo rig on top of a car. From these streams, we estimate Structure-from-Motion (SfM) and scene geometry in real-time. In parallel, we perform multi-view/multi-category object recognition to detect cars and pedestrians in both camera images. Using the SfM self-localization, 2D object detections are converted to 3D observations, which are accumulated in a world coordinate frame. A subsequent tracking module analyzes the resulting 3D observations to find physically plausible spacetime trajectories. Finally, a global optimization criterion takes object-object interactions into account to arrive at accurate 3D localization and trajectory estimates for both cars and pedestrians. We demonstrate the performance of our integrated system on challenging real-world data showing car passages through crowded city areas. Paper link.
Li Guan, Jean-Sebastien Franco, and Marc Pollefeys
We consider the problem of detecting and accounting for the presence of occluders in a 3D scene based on silhouette cues in video streams obtained from multiple, calibrated views. While well studied and robust in controlled environments, silhouette-based reconstruction of dynamic objects fails in general environments where uncontrolled occlusions are commonplace, due to inherent silhouette corruption by occluders. We show that occluders in the interaction space of dynamic objects can be detected and their 3D shape fully recovered as a byproduct of shape-from-silhouette analysis. We provide a Bayesian sensor fusion formulation to process all occlusion cues occurring in a multi-view sequence. Results show that the shape of static occluders can be robustly recovered from pure dynamic object motion, and that this information can be used for online self-correction and consolidation of dynamic object shape reconstruction. PDF
Wednesday, June 06, 2007
Nikos Komodakis, Georgios Tziritas, and Nikos Paragios
A new efficient MRF optimization algorithm, called Fast-PD, is proposed, which generalizes alpha-expansion. One of its main advantages is that it offers a substantial speedup over that method, e.g. it can be at least 3-9 times faster than alpha-expansion. Its efficiency is a result of the fact that Fast-PD exploits information coming not only from the original MRF problem, but also from a dual problem. Furthermore, besides static MRFs, it can also be used for boosting the performance of dynamic MRFs, i.e. MRFs varying over time. On top of that, Fast-PD makes no compromise about the optimality of its solutions: it can compute exactly the same answer as aplha-expansion, but, unlike that method, it can also guarantee an almost optimal solution for a much wider class of NP-hard MRF problems. Results on static and dynamic MRFs demonstrate the algorithm’s efficiency and power. E.g., Fast-PD has been able to compute disparity for stereoscopic sequences in real time, with the resulting disparity coinciding with that of alpha-expansion. PDF
Dylan F. Glas, Takahiro Miyashita, Hiroshi Ishiguro, and Norihiro Hagita
2007 IEEE International Conference on Robotics and Automation
We have developed a new communication robot, Robopal, which is an indoor/outdoor robot for use in humanrobot interaction research in the context of daily life. Robopal’s intended applications involve leading and/or following a human to a destination. Preliminary experiments have been conducted to study nonverbal cues associated with leading and following behavior, and it has been observed that some behaviors, such as glancing towards the leader or follower, appear to be roledependent. A system for representing these behaviors with a state transition model is described, based on four kinds of interaction roles: directive, responsive, collaborative, and independent. It is proposed that behavior modeling can be simplified by using this system to represent changes in the roles the robot and human play in an interaction, and by associating appropriate behaviors to each role.
Belief propagation over pairwise connected Markov Random Fields has become a widely used approach, and has been successfully applied to several important computer vision problems. However, pairwise interactions are often insufficient to capture the full statistics of the problem. Higher-order interactions are sometimes required. Unfortunately, the complexity of belief propagation is exponential in the size of the largest clique. In this paper, we introduce a new technique to compute belief propagation messages in time linear with respect to clique size for a large class of potential functions over real-valued variables. We demonstrate this technique in two applications. First, we perform efficient inference in graphical models where the spatial prior of natural images is captured by 2 × 2 cliques. This approach shows significant improvement over the commonly used pairwise-connected models, and may benefit a variety of applications using belief propagation to infer images or range images. Finally, we apply these techniques to shape-from-shading and demonstrate significant improvement over previous methods, both in quality and in flexibility. PDF
Murat Dundar and Jinbo Bi
The existing methods for offline training of cascade classifiers take a greedy search to optimize individual classifiers in the cascade, leading inefficient overall performance. We propose a new design of the cascaded classifier where all classifiers are optimized for the final objective function. The key contribution of this paper is the AND-OR framework for learning the classifiers in the cascade. In earlier work each classifier is trained independently using the examples labeled as positive by the previous classifiers in the cascade, and optimized to have the best performance for that specific local stage. The proposed approach takes into account the fact that an example is classified as positive by the cascade if it is labeled as positive by all the stages and it is classified as negative if it is rejected at any stage in the cascade. An offline training scheme is introduced based on the joint optimization of the classifiers in the cascade to minimize an overall objective function. We apply the proposed approach to the problem of automatically detecting polyps from multi-slice CT images. Our approach significantly speeds up the execution of the Computer Aided Detection (CAD) system while yielding comparable performance with
the current state-of-the-art, and also demonstrates favorable results over Cascade AdaBoost both in terms of performance and online execution speed. PDF
CVPR oral: Beyond Local Appearance: Category Recognition from Pairwise Interactions of Simple Features
Marius Leordeanu, Martial Hebert, and Rahul Sukthankar
We present a discriminative shape-based algorithm for object category localization and recognition. Our method learns object models in a weakly-supervised fashion, without requiring the specification of object locations nor pixel masks in the training data. We represent object models as cliques of fully-interconnected parts, exploiting only the pairwise geometric relationships between them. The use of pairwise relationships enables our algorithm to successfully overcome several problems that are common to previously-published methods. Even though our algorithm can easily incorporate local appearance information from richer features, we purposefully do not use them in order to demonstrate that simple geometric relationships can match (or exceed) the performance of state-of-the-art object recognition algorithms. PDF
By Stefanie Olsen
Staff Writer, CNET News.com
Published: June 5, 2007, 12:01 PM PDT
People in downtown Ithaca, N.Y., got a glimpse this spring of the vehicular equivalent of a headless horseman--a Chevy Tahoe gutted and modified with computers, wire controls and sensors so that it can drive city streets by itself.
Leonard specifically wants to build a robot that can operate forever autonomously, capable of dealing with change in the world. From an algorithm perspective, he said, that means building sophisticated maps of a fluid world.
"We're tremendously advanced in mapping static environments, but there's been very little progress in mapping dynamic environments. To reason about them and make intelligent decisions about things that are moving in the world--that's the challenge."
See the full article.
Tuesday, June 05, 2007
Ting Li, Vinutha Kallem, Dheeraj Singaraju, and Rene Vidal
Given point correspondences in multiple perspective views of a scene containing multiple rigid-body motions, we present an algorithm for segmenting the correspondences according to the multiple motions. We exploit the fact that when the depths of the points are known, the point trajectories associated with a single motion live in a subspace of dimension at most four. Thus motion segmentation with known depths can be achieved by methods of subspace separation, such as GPCA or LSA. When the depths are unknown, we proceed iteratively. Given the segmentation, we compute the depths using standard techniques. Given the depths, we use GPCA or LSA to segment the scene into multiple motions. Experiments on the Hopkins155 motion segmentation database show that our method compares favorably against existing affine motion segmentation methods in terms of segmentation error and execution time. PDF
Lab Meeting 6 June (Any): A New Approach for Large-Scale Localization and Mapping: Hybrid Metric-Topological SLAM
Jose-Luis Blanco, Juan-Antonio Fernández, Javier Gonzalez
Dept. of System Engineering and Automation
University of Malaga
Title : Developing Landmark-Based Pedestrian-Navigation System
Authur : Alexandra Millonig Katja Schechtner
Pedestrian-navigation services enable people to retrieve precise instructions to reach a specific location. However, the development of mobile spatial-information technologies for pedestrians is still at the beginning and faces several difficulties. As the spatial behavior of people on foot differs in many ways from the driver's performance, common concepts for car-navigation services are not suitable for pedestrian navigation. Particularly, the usage of landmarks is vitally important in human navigation. This contribution points out the main requirements for pedestrian-navigation technologies and presents an approach to identify pedestrian flows and to imply landmark information into navigation services for pedestrians[Link]
Grant Schindler, Frank Dellaert, and Sing Bing Kang
In this paper, we describe a technique to temporally sort a collection of photos that span many years. By reasoning about persistence of visible structures, we show how this sorting task can be formulated as a constraint satisfaction problem (CSP). Casting this problem as a CSP allows us to efficiently find a suitable ordering of the images despite the large size of the solution space (factorial in the number of images) and the presence of occlusions. We present experimental results for photographs of a city acquired over a one hundred year period. PDF
Minh-Tri Pham and Tat-Jen Cham
We present an integrated framework for learning asymmetric boosted classifiers and online learning to address the problem of online learning asymmetric boosted classifiers, which is applicable to object detection problems. In particular, our method seeks to balance the skewness of the labels presented to the weak classifiers, allowing them to be trained more equally. In online learning, we introduce an extra constraint when propagating the weights of the data points from one weak classifier to another, allowing the algorithm to converge faster. In compared with the Online Boosting algorithm recently applied to object detection problems, we observed about 0-10% increase in accuracy, and about 5-30% gain in learning speed. PDF
Monday, June 04, 2007
Hua Yang, Marc Pollefeys, Greg Welch, Jan-Michael Frahm, and Adrian Ilie
The appearance of a scene is a function of the scene contents, the lighting, and the camera pose. A set of n-pixel images of a non-degenerate scene captured from different perspectives lie on a 6D nonlinear manifold in R^n. In general, this nonlinear manifold is complicated and numerous samples are required to learn it globally. In this paper, we present a novel method and some preliminary results for incrementally tracking camera motion through sampling and linearizing the local appearance manifold. At each frame time, we use a cluster of calibrated and synchronized small baseline cameras to capture scene appearance samples at different camera poses. We compute a first-order approximation of the appearance manifold around the current camera pose. Then, as new cluster samples are captured at the next frame time, we estimate the incremental camera motion using a linear solver. By using intensity measurements and directly sampling the appearance manifold, our method avoids the commonly-used feature extraction and matching processes, and does not require 3D correspondences across frames. Thus it can be used for scenes with complicated surface materials, geometries, and view-dependent appearance properties, situations where many other camera tracking methods would fail. PDF
Sunday, June 03, 2007
CVPR07 oral: A Lagrangian Particle Dynamics Approach for Crowd Flow Segmentation and Stability Analysis
Saad Ai and Mubarak Shah
This paper proposes a framework in which Lagrangian Particle Dynamics is used for the segmentation of high density crowd flows and detection of flow instabilities. For this purpose, a flow field generated by a moving crowd is treated as an aperiodic dynamical system. A grid of particles is overlaid on the flow field, and is advected using a numerical integration scheme. The evolution of particles through the flow is tracked using a Flow Map, whose spatial gradients are subsequently used to setup a Cauchy Green Deformation tensor for quantifying the amount by which the neighboring particles have diverged over the length of the integration. The maximum eigenvalue of the tensor is used to construct a Finite Time Lyapunov Exponent (FTLE) field, which reveals the Lagrangian Coherent Structures (LCS) present in the underlying flow. The LCS divide flow into regions of qualitatively different dynamics and are used to locate boundaries of the flow segments in a normalized cuts framework. Any change in the number of flow segments over time is regarded as an instability, which is detected by establishing correspondences between flow segments over time. The experiments are conducted on a challenging set of videos taken from Google Video and a National Geographic documentary. PDF
Alexander Toshev, Jianbo Shi, and Kostas Daniilidis
We introduce the notion of co-saliency for image matching. Our matching algorithm combines the discriminative power of feature correspondences with the descriptive power of matching segments. Co-saliency matching score favors correspondences that are consistent with 'soft' image segmentation as well as with local point feature matching. We express the matching model via a joint image graph (JIG) whose edge weights represent intra- as well as inter-image relations. The dominant spectral components of this graph lead to simultaneous pixel-wise alignment of the images and saliency-based synchronization of 'soft' image segmentation. The co-saliency score function, which characterizes these spectral components, can be directly used as a similarity metric as well as a positive feedback for updating and establishing new point correspondences. We present experiments showing the extraction of matching regions and pointwise correspondences, and the utility of the global image similarity in the context of place recognition. PDF
Roberto Tron & Ren´e Vidal
Over the past few years, several methods for segmenting a scene containing multiple rigidly moving objects have been proposed. However, most existing methods have been tested on a handful of sequences only, and each method has been often tested on a different set of sequences. Therefore, the comparison of different methods has been fairly limited. In this paper, we compare four 3-D motion segmentation algorithms for affine cameras on a benchmark of 155 motion sequences of checkerboard, traffic, and articulated scenes. PDF
Simon Winder and Matthew Brown
In this paper we study interest point descriptors for image matching and 3D reconstruction. We examine the building blocks of descriptor algorithms and evaluate numerous combinations of components. Various published descriptors such as SIFT, GLOH, and Spin Images can be cast into our framework. For each candidate algorithm we learn good choices for parameters using a training set consisting of patches from a multi-image 3D reconstruction where accurate ground-truth matches are known. The best descriptors were those with log polar histogramming regions and feature vectors constructed from rectified outputs of steerable quadrature filters. At a 95% detection rate these gave one third of the incorrect matches produced by SIFT. PDF
Herve Jegou, Hedi Harzallah, and Cordelia Schmid
In this paper we present two contributions to improve accuracy and speed of an image search system based on bag-of-features: a contextual dissimilarity measure (CDM) and an efficient search structure for visual word vectors.
Our measure (CDM) takes into account the local distribution of the vectors and iteratively estimates distance correcting terms. These terms are subsequently used to update an existing distance, thereby modifying the neighborhood structure. Experimental results on the Nist´er-Stew´enius dataset show that our approach significantly outperforms the state-of-the-art in terms of accuracy.
Our efficient search structure for visual word vectors is a two-level scheme using inverted files. The first level partitions the image set into clusters of images. At query time, only a subset of clusters of the second level has to be searched. This method allows fast querying in large sets of images. We evaluate the gain in speed and the loss in accuracy on large datasets (up to 1 million images).
Eric Nowak and Frederic Jurie
In this paper we propose and evaluate an algorithm that learns a similarity measure for comparing never seen objects. The measure is learned from pairs of training images labeled same or different. This is far less informative than the commonly used individual image labels (e.g. car model X), but it is cheaper to obtain. The proposed algorithm learns the characteristic differences between local descriptors sampled from pairs of same and different images. These differences are vector quantized by an ensemble of extremely randomized binary trees, and the similarity measure is computed from the quantized differences. The extremely randomized trees are fast to learn, robust due to the redundant information they carry and they have been proved to be very good clusterers. Furthermore, the trees efficiently combine different feature types (SIFT and geometry). We evaluate our innovative similarity measure on four very different datasets and consistantly outperform the state-of-the-art competitive approaches. PDF
Siniˇsa ˇSegvi´c, Anthony Remazeilles, Albert Diosi and Franc¸ois Chaumette
Autonomous cars will likely play an important role in the future. A vision system designed to support outdoor navigation for such vehicles has to deal with large dynamic environments, changing imaging conditions, and temporary occlusions by other moving objects. This paper presents a novel appearance-based navigation framework relying on a single perspective vision sensor, which is aimed towards resolving of the above issues. The solution is based on a hierarchical environment representation created during a teaching stage, when the robot is controlled by a human operator. At the top level, the representation contains a graph of key-images with extracted 2D features enabling a robust navigation by visual servoing. The information stored at the bottom level enables to efficiently predict the locations of the features which are currently not visible, and eventually (re-)start their tracking. The outstanding property of the proposed framework is that it enables robust and scalable navigation without requiring a globally consistent map, even in interconnected environments. This result has been confirmed by realistic off-line experiments and successful real-time navigation trials in public urban areas. PDF
Hao Wu, Aswin C Sankaranarayanan and Rama Chellappa
Automatic evaluation of visual tracking algorithms in the absence of ground truth is a very challenging and important problem. In the context of online appearance modeling,there is an additional ambiguity involving the correctness of the appearance model. In this paper, we propose a novel performance evaluation strategy for tracking systems based on particle filter using a time reversed Markov chain. Starting from the latest observation, the time reversed chain is propagated back till the starting time t = 0 of the tracking algorithm. The posterior density of the time reversed chain is also computed. The distance between the posterior density of the time reversed chain (at t = 0) and the prior density used to initialize the tracking algorithm forms the decision statistic for evaluation. It is postulated that when the
data is generated true to the underlying models, the decision statistic takes a low value. We empirically demonstrate the performance of the algorithm against various common failure modes in the generic visual tracking problem. Finally, we derive a small frame approximation that allows for very efficient computation of the decision statistic. PDF
Andriy Myronenko, Xubo Song and Miguel A´ . Carreira-Perpina´n
We introduce a novel probabilistic approach for nonparametric nonrigid image registration using generalized elastic nets, a model previously used for topographic maps. The idea of the algorithm is to adapt an elastic net (a constrained Gaussian mixture) in the spatial-intensity space of one image to fit the second image. The resulting net directly represents the correspondence between image pixels in a probabilistic way and recovers the underlying image deformation. We regularize the net with a differential prior and develop an efficient optimization algorithm using linear conjugate gradients. The nonparametric formulation allows for complex transformations having local deformation. The method is generally applicable to registering point sets of arbitrary features. The accuracy and effectiveness of the method are demonstrated on different medical image and point set registration examples with locally nonlinear underlying deformations. PDF
Speaker: Ramesh Raskar http://www.merl.com/people/raskar/raskar.html
Abstract: Using a combination of techniques in optical and radio frequency domains, one can significantly improve the functionality of cameras for sensing, projectors for augmentation and RFIDs for location sensing services. We have recently developed a technique to capture a light field and improve the depth of field of a camera using heterodyning methods common in radio frequency modulation (http://www.merl.com/people/raskar/Mask/ and http://www.merl.com/people/raskar/deblur/). We have also shown that sensor-enhanced wireless tags can be precisely located at 500 times/second by exploiting the epipolar geometry of projectors (http://www.merl.com/people/raskar/LumiNetra/ ). The talk will explore the implication of this fusion in Computational Photography, Motion Capture, Augmented Reality and Displays.
Ramesh Raskar is a Senior Research Scientist at MERL. His work spans a range of topics in computer vision and graphics including computational photography, projective geometry, non-photorealistic rendering and intelligent user interfaces. Current projects include optical heterodyning photography, flutter shutter camera, composite RFID (RFIG), multi-flash non-photorealistic camera for depth edge detection, locale-aware mobile projectors, high dynamic range video, image fusion for context enhancement and quadric transfer methods for multi-projector curved screen displays.
Dr. Raskar received the TR100 Award, Technology Review's 100 Top Young Innovators Under 35 worldwide, 2004 and Global Indus Technovator Award 2003, instituted at MIT to recognize the top 20 Indian technology innovators on the globe. He holds 25 US patents and has received Mitsubishi Electric Invention Awards in 2003, 2004 and 2006. He is a member of the ACM and IEEE. http://www.merl.com/people/raskar/raskar.html