Robot Perception and Learning: February 2009

Saturday, February 28, 2009

Lab Meeting March 2, 2009 (Casey) : Nonrigid Structure from Motion in Trajectory Space

Title: Nonrigid Structure from Motion in Trajectory Space

Authors: Ijaz Akhter, Yaser Sheikh, SohaibKhan and Takeo Kanade

Abstract:

Existing approaches to nonrigid structure from motion assume that the instantaneous 3D shape of a deforming object is a linear combination of basis shapes, which have to be estimated anew for each video sequence. In contrast, we propose that the evolving 3D structure be described by a linear combination of basis trajectories. The principal advantage of this approach is that we do not need to estimate any basis vectors during computation. We show that generic bases over trajectories, such as the Discrete Cosine Transform (DCT) basis, can be used to compactly describe most real motions. This results in a significant reduction in unknowns, and corresponding stability in estimation. We report empirical performance, quantitatively using motion capture data and qualitatively on several video sequences exhibiting nonrigid motion including piece-wise rigid motion, partially nonrigid motion(such as a facial expression), and highly nonrigid motion(such as a person dancing).

[Link]

Lab Meeting March 2, 2009(ZhenYu): Scale Invariant Feature Matching with Wide Angle Images

Title: Scale Invariant Feature Matching with Wide Angle Images(IROS 2007)
Authors: Hansen, P. Corke, P. Boles, W. Daniilidis, K.
Abstract:

Numerous scale-invariant feature matching algorithms using scale-space analysis have been proposed for use with perspective cameras, where scale-space is defined as convolution with a Gaussian. The contribution of this work is a method suitable for use with wide angle cameras. Given an input image, we map it to the unit sphere and obtain scale-space images by convolution with the solution of the spherical diffusion equation on the sphere which we implement in the spherical Fourier domain. Using such an approach, the scale-space response of a point in space is independent of its position on the image plane for a camera subject to pure rotation. Scale invariant features are then found as local extrema in scale-space. Given this set of scale-invariant features, we then generate feature descriptors by considering a circular support region defined on the sphere whose size is selected relative to the feature scale. We compare our method to a naive implementation of SIFT where the image is treated as perspective, where our results show an improvement in matching performance.

[LINK]

Saturday, February 21, 2009

Automatic Class-Specific 3D Reconstruction from a Single Image

Title: Automatic Class-Specific 3D Reconstruction from a Single Image
Authors: Han-Pang Chiu, Leslie Pack Kaelbling, Tom´as Lozano-P´erez

Abstract: Our goal is to automatically reconstruct 3D objects from a single image, by using prior 3D shape models of classes. The shape models, defined as a collection of oriented primitive shapes centered at fixed 3D positions, can be learned from a few labeled images for each class. The 3D class model can then be used to estimate the 3D shape of an object instance, including occluded parts, from a single image. We provide a quantitative valuation of the shape estimation process on real objects and demonstrate its usefulness in three applications: robot manipulation, object detection, and generating 3D 'pop-up' models from photos.

Link: pdf

Wednesday, February 18, 2009

CMU talk: Scaling Up Game Theory: Representation and Reasoning with Action Graph Games

Intelligence Seminar

February 24, 2009

Scaling Up Game Theory: Representation and Reasoning with Action Graph Games
Kevin Leyton-Brown
Computer Science Department, University of British Columbia

Abstract:

Game theory is the mathematical study of interaction among independent, self-interested agents. It has wide applications, including the design of government auctions (e.g., for distressed securities), urban planning, and the analysis of internet traffic patterns. Interestingly, most work in game theory is analytic; it is less common to analyze a model's properties computationally. Key reasons for this are that game representation size tends to grow exponentially in the number of players--making all but the simplest games infeasible to write down--and that even when games can be represented, existing algorithms (e.g., for finding equilibria) tend to have worst-case performance exponential in the game's size.

This talk describes Action-Graph Games (AGG), which make it possible to extend computational analysis to games that were previously far too large to consider. I will give an overview of our five-year effort developing AGGs, emphasizing the twin threads of representational compactness and computational tractability. More specifically, the first part of the talk will describe the core ideas of the AGG representation. AGGs are a fully-expressive, graph-based representation that can compactly express both strict and context-specific independencies in players' utility functions. I will illustrate the representation by describing several practical examples of games that may be compactly represented as AGGs. The second part of the talk will examine algorithmic considerations. First, I'll describe a dynamic programming algorithm for computing a player's expected utility under a given mixed-strategy profile, which is tractable for bounded-in-degree AGGs. This algorithm can be leveraged to provide an exponential speedup in the computation of best response, Nash equilibrium, and correlated equilibrium. Second, I'll describe a message-passing algorithm for computing pure-strategy Nash equilibria in symmetric AGGs, which is tractable for graphs with bounded treewidth; again, this implies an exponential speedup over the previous state of the art. Finally, I'll more briefly describe some current directions in our work on AGGs: the modeling, evaluation, and comparison of different advertising auction designs; the extension of AGGs to both temporal and stochastic settings; and the design of free software tools to make it easier for other researchers to use AGGs.

This talk is based on joint work with Albert Xin Jiang, David R.M. Thompson, and Navin A.R. Bhat.

Bio:
Kevin Leyton-Brown is an assistant professor in computer science at the University of British Columbia. He received a B.Sc. from McMaster University (1998), and an M.Sc. and PhD from Stanford University (2001; 2003). Much of his work is at the intersection of computer science and microeconomics, addressing computational problems in economic contexts and incentive issues in multi-agent systems. He also studies the application of machine learning to the design and analysis of algorithms for solving hard computational problems. He has co-written two Multiagent Essentials of Game and over forty peer-refereed technical articles. He is an associate editor of the Journal of Artificial Intelligence Research (JAIR) and a member of the editorial board of the Artificial Intelligence Journal (AIJ). He has served as a consultant for Trading Dynamics Inc., Ariba Inc., and Cariocas Inc., and is currently scientific advisor to Worio Inc.

Tuesday, February 17, 2009

Lab Meeting February 23, 2009(Chung-Han) Human-Assisted Motion Annotation

Title : Human-Assisted Motion Annotation

Auther : Ce Liu, William T. Freeman, Edward H. Adelson, Yair Weiss (CSAIL MIT)

Abstract :
Obtaining ground-truth motion for arbitrary, real-worldvideo sequences is a challenging but important task for both algorithm evaluation and model design. Existing ground-truth databases are either synthetic, such as the Yosemite sequence, or limited to indoor, experimental setups, such as the database developed in [5]. We propose a human-in loop methodology to create a ground-truth motion database for the videos taken with ordinary cameras in both indoor and outdoor scenes, using the fact that human beings are experts at segmenting objects and inspecting the match between two frames. We designed an interactive computer vision system to allow a user to efficiently annotate motion.Our methodology is cross-validated by showing that human annotated motion is repeatable, consistent across annotators,and close to the ground truth obtained by [5]. Using our system, we collected and annotated 10 indoor andoutdoor real-world videos to form a ground-truth motion database. The source code, annotation tool and database is online for public evaluation and benchmarking.

Presented at CVPR 2008.

[Full Text]

Lab Meeting Feburary 23, 2009 (Jimmy): Multi-robot Cooperative Localization through Collaborative Visual Object Tracking

In: RoboCup 2007

Title: Multi-robot Cooperative Localization through Collaborative Visual Object Tracking

Authors: Zhibin Liu, Mingguo Zhao, Zongying Shi, and Wenli Xu

Abstract

In this paper we present an approach for a team of robots to cooperatively improve their self localization through collaboratively tracking a moving object. At first, we use a Bayes net model to describe the multi-robot self localization and object tracking problem. Then, by exploring the independencies between different parts of the joint state space of the complex system, we show how the posterior estimation of the joint state can be factorized and the moving object can serve as a bridge for information exchange between the robots for realizing cooperative localization. Based on this, a particle filtering method for the joint state estimation problem is proposed. And, finally, in order to improve computational efficiency and achieve real-time implementation, we present a method for decoupling and distributing the joint state estimation onto different robots. The approach has been implemented on our four-legged AIBO robots and tested through different scenarios in RoboCup domain showing that the performance of localization can indeed be improved significantly.

[link]

Saturday, February 14, 2009

MIT talk: Do predictions of visual perception aid design?

Do predictions of visual perception aid design?
Speaker: Ruth Rosenholtz, MIT Brain and Cognitive Sciences
Date: Friday, February 20 2009
Time: 2:00PM to 3:00PM

Understanding and exploiting the abilities of the human visual system is an important part of the design of usable user interfaces and information visualizations. Designers traditionally learn qualitative rules-of-thumb for how to enable quick, easy and veridical perception of their design. More recently, work in human and computer vision, including in our lab, has produced more quantitative models of human perception. These models often take as input arbitrary, complex images of a design. We ask whether such models aid the design process. Through a series of interactions with designers and design teams, we find that the models can help, but in somewhat unexpected ways. Based on this study, I will suggest general design principles for perceptual tools.

Ruth Rosenholtz is a Principal Research Scientist in MIT's Department of Brain and Cognitive Sciences, and a member of CSAIL. She joined MIT in 2004 after 7 years at the Palo Alto Research Center (formerly Xerox PARC). Ruth's background is in electrical engineering, particularly computer vision. More recently, however, she has studied human vision, and in particular visual search and attention. Her engineering background shows through in her focus on finding predictive mathematical models of phenomena in human vision, and in her interest in applying knowledge of human perception to the design of better user interfaces and information visualizations.

Honda Research Institute : Summer internship 2009

The computer science research section of Honda Research Institute USA, located in Mountain View, California and Cambridge, Massachusetts, has several openings for internships this summer. For details, please go to:

http://www.honda-ri.com/HRI_Us/careers/internships

The application deadline is March 31, 2009.

CMU talk: Bringing people in the loop

VASC Seminar
Monday, February 16

Bringing people in the loop: data annotation and real-time visual supervision with Amazon Mechanical Turk
Alexander Sorokin
University of Illinois at Urbana-Champaign

Abstract:

In this talk I will challenge the common assumption that "we will never have enough labeled data". As of today, large annotated datasets can be built very quickly and cheaply. To achieve that a number of questions needs to be answered. I will discuss what properties of the tasks affect the quality of the final result as well as cost and speed of task completion. I will further highlight how people in the loop framework expands the range of possible applications.

Bio:
Alexander Sorokin is a 5th year PhD student at the University of Illinois at Urbana-Champaign. He has received BS from Lomonosov Moscow State University in 2003. In 2007 he received a best paper award at InfoVis 2007 for the paper "Visualizing the History of Living Spaces".

Friday, February 13, 2009

CMU talk: Large Scale Scene Matching for Graphics and Vision

Speaker: James Hays (CSD@CMU)
Date: Monday, Feb 16, 2009
Time: 12:00 noon

Title:
Large Scale Scene Matching for Graphics and Vision

Abstract:
The complexity of the visual world makes it difficult for computer vision to understand images and for computer graphics to synthesize visual content. The traditional computer vision or computer graphics pipeline mitigates this complexity with a bottom up, divide and conquer strategy (e.g. segmenting then classifying, assembling part-based models, or using scanning window detectors). In this talk I will discuss research that is fundamentally different, enabled by the observation that while the space of images is infinite, the space of "scenes" might not be astronomically large. With access to imagery on an Internet scale, for most images there exist numerous semantically and structurally similar scenes. My research is focused on exploiting and refining large scale scene matching to short circuit the typical computer vision and graphics pipelines for tasks such as scene completion, image geolocation, object detection, and high interest photo selection.

Wednesday, February 04, 2009

CMU talk: Intelligent Preference Assessment: The Next Steps?

Intelligence Seminar

February 3, 2009
3:30 pm

Intelligent Preference Assessment: The Next Steps?

Craig Boutilier, Department of Computer Science, University of Toronto

Preference elicitation is generally required when making or recommending decisions on behalf of users whose utility function is not known with certainty. Full elicitation of user utility functions is infeasible in practice, leading to an emphasis on approaches that (a) attempt to make good recommendations with incomplete utility information; and (b) heuristically minimize the amount of user interaction needed to assess relevant aspects of a utility function. Current techniques are, however, limited in a number of ways: (i) they rely on specific forms of information for assessment; (ii) they require very stylized forms of interaction; (iii) they are limited in the types of decision problems that can be handled.

In this talk, I will outline several key research challenges in taking preference assessment to a point where wide user acceptance is possible. I will focus on three three current techniques we're developing that will help move in the direction of greater user acceptance. Each tackles one of the weaknesses discussed above.

1. The first two techniques allows users to define "personalized" features over which they can express their preferences. Users provide (positive and negative) instances of a concept (or feature) over which they have preferences. We relate this to models of concept learning, and discuss how they existence of utility functions allows decisions to be made with very incomplete knowledge of the target concept. I'll also discuss possible means integrating data-intensive collaborative filtering approaches with explicit preference elicitation techniques, especially when tackling "subjective" features.

2. I'll discuss some of our recent work on applying explicit decision-theoretic models to more "conversational" critiquing approaches to recommender systems. We consider several semantics (wrt user preferences) for unstructured user choices and show how these can be integrated into regret-based models.

3. Time permitting, I'll provide a sketch of some recent work on eliciting reward functions in Markov decision processes using the notion of minimax regret.

Bio:

Craig Boutilier received his Ph.D. in Computer Science (1992) from the University of Toronto, Canada. He is Professor and Chair of the Department of Computer Science at the University of Toronto. He was previously an Associate Professor at the University of British Columbia, a consulting professor at Stanford University, and a visiting professor at Brown University. He has served on the Technical Advisory Board of CombineNet, Inc. since 2001.

Dr. Boutilier's research interests span a wide range of topics, with a focus on decision making under uncertainty, including preference elicitation, mechanism design, game theory, Markov decision processes, and reinforcement learning. He is a Fellow of the American Association of Artificial Intelligence (AAAI) and the recipient of the Isaac Walton Killam Research Fellowship, an IBM Faculty Award and the Killam Teaching Award. He has also served in a variety of conference organization and editorial positions, and is Program Chair of the upcoming Twenty-first International Joint Conference on Artificial Intelligence (IJCAI-09).