This Blog is maintained by the Robot Perception and Learning lab at CSIE, NTU, Taiwan. Our scientific interests are driven by the desire to build intelligent robots and computers, which are capable of servicing people more efficiently than equivalent manned systems in a wide variety of dynamic and unstructured environments.
Sunday, October 01, 2006
[ML Lunch] Monday 10/02
We'll be having a KDD 2006 conference review
session this Monday 10/02.
Hanghang Tong and Jimeng Sun will lead the
session with the following
papers:
1. Hanghang Tong on
Center-Piece Subgraphs: Problem Definition and
Fast Solutions.
by Hanghang Tong and Christos Faloutsos
2. Jimeng Sun on
Beyond Streams and Graphs: Dynamic Tensor
Analysis.
by Jimeng Sun, Yufei Tao, Christos Faloutsos
Venue: NSH 1507
Date : Monday, October 02
Time : 12:00 noon
And of course, thanks to MLD, lunch will be
served.
For schedules, links to papers et al, please see
the web page:
http://www.cs.cmu.edu/~learning
We are on the lookout for speakers for the
semester, so do send any of
us an email if you'd like to give a talk.
Your ML Lunch organizing committee,
Edoardo Airoldi (eairoldi@cs.cmu.edu)
Anna Goldenberg (anya@cs.cmu.edu)
Leonid Kontorovich (lkontor@cs.cmu.edu)
Andreas Krause (krausea@cs.cmu.edu)
Jure Leskovec (jure@cs.cmu.edu)
Pradeep Ravikumar (pradeepr@cs.cmu.edu)
=======================================================================
Abstracts:
1. Center-Piece Subgraphs: Problem Definition and
Fast Solutions
by Hanghang Tong and Christos Faloutsos
Given $\QN$ nodes in a social network (say,
authorship network), how can
we find the node/author that is the center-piece,
and has direct or
indirect connections to all, or most of them? For
example, this node
could be the common advisor, or someone who
started the research area
that the $\QN$ nodes belong to. Isomorphic
scenarios appear in law
enforcement (find the master-mind criminal,
connected to all current
suspects), gene regulatory networks (find the
protein that participates
in pathways with all or most of the given $\QN$
proteins), viral
marketing and many more. Connection subgraphs is
an important first
step, handling the case of $\QN$=2 query nodes.
Then, the connection
subgraph algorithm finds the $b$ intermediate
nodes, that provide a good
connection between the two query nodes. Here we
generalize the challenge
in multiple dimensions: First, we allow more than
two query nodes.
Second, we allow a whole family of queries,
ranging from 'OR' to 'AND',
with 'softAND' in-between. Finally, we design and
compare a fast
approximation, and study the quality/speed
trade-off. We also present
experiments on the DBLP dataset. The experiments
confirm that our
proposed method naturally deals with multi-source
queries and that the
resulting subgraphs agree with our intuition.
Wall-clock timing results
on the DBLP dataset show that our proposed
approximation achieve good
accuracy for about $6:1$ speedup.
2. Beyond Streams and Graphs: Dynamic Tensor
Analysis
by Jimeng Sun, Yufei Tao, Christos Faloutsos
How do we find patterns in author-keyword
associations, evolving over
time? Or in DataCubes, with
product-branch-customer sales information?
Matrix decompositions, like principal component
analysis (PCA) and
variants, are invaluable tools for mining,
dimensionality reduction,
feature selection, rule identification in
numerous settings like
streaming data, text, graphs, social networks and
many more.However,
they have only two orders, like author and
keyword, in the above
example. We propose to envision such higher order
data as tensors, and
tap the vast literature on the topic. However,
these methods do not
necessarily scale up, let alone operate on
semi-infinite streams. Thus,
we introduce the dynamic tensor analysis (DTA)
method, and its variants.
DTA provides a compact summary for high-order and
high-dimensional data,
and it also reveals the hidden correlations.
Algorithmically,we designed
DTA very carefully so that it is (a) scalable,
(b) space efficient (it
does not need to store the past) and (c) fully
automatic with no need
for user defined parameters. Moreover, we propose
STA, a streaming
tensor analysis method, which provides a fast,
streaming approximation
to DTA. We implemented all our methods, and
applied them in two real
settings, namely, anomaly detection and multi-way
latent semantic
indexing. We used two real, large datasets, one
on network flow data
(100GB over 1 month) and one from DBLP (200MB
over 25 years). Our
experiments show that our methods are fast,
accurate and that they find
interesting patterns and outliers on the real
datasets.
Friday, September 29, 2006
Projector-Guided Painting
Project Description:
This paper presents a novel interactive system for guiding artists to paint using traditional media and tools. The enabling technology is a multi-projector display capable of controlling the appearance of an artist.s canvas. Artists are guided by this display-on-canvas to paint according to a process model we designed to solve 3 common problems with novice painters. The artist paints according to a linear process of painting in layers and, within each layer, a set of colors. Each component of our model of the painting process has an associated interaction mode. Preview mode shows the entire layer as the current painting goal. Blank mode reveals the state of the painting. Color selection mode displays where to paint a target color. Color mixing mode shows how to mix it and orientation mode shows how to paint it. These interaction modes enable the novice to focus on painting sub-tasks in order to simplify the painting process while providing technical guidance ranging from high-level composition to detailed brushwork. We present results of a user study that quantify the benefit that our system can provide to a novice painter.
Publication:
This work will be published and presented at User Interface Software Technology in Montreux, Switzerland in October, 2006.
[Link]
Wednesday, September 27, 2006
[Robotics Institute Thesis Oral 2 Oct 2006]Holistic Modeling and Tracking of Road Scenes
Robotics Institute
Carnegie Mellon University
Place and Time
NSH 3305
11:00 AM
Abstract
This thesis proposal addresses the problem of road scene understanding for driver warning systems in intelligent vehicles, which require a model of cars, pedestrians, the lane structure of the road, and any static obstacles on it in order to accurately predict possible dangerous situations. Previous work on using computer vision in intelligent vehicle applications stops short of holistic modeling of the entire road scene. In particular, no lane tracking systems exists which detect and track multiple lanes or integrate lane tracking with tracking of cars, pedestrians, and other relevant objects. In this thesis, we focus on the goal of holistic road scene understanding, and we propose contributions in three areas: (1) the low-level detection of road scene elements such as tarmac and painted stripes; (2) modeling and tracking of complex lane structures, and (3) the integration of lane structure tracking with car and pedestrian tracking.
Further Details
A copy of the thesis oral document can be found at
link
Thesis Committee
Takeo Kanade, Chair
Charles Thorpe
Alexei Efros
Simon Baker, Microsoft Research, Seattle
[Robotics Institute Seminar] Is the human hand dexterous because, or in spite, of its anatomical complexity?
Francisco Valero-Cuevas
Cornell University
Time and Place
Mauldin Auditorium (NSH 1305)
Refreshments 3:15 pm
Talk 3:30 pm
Abstract
The human hand is a pinnacle of mechanical versatility unequaled by electromechanical systems. It is clearly a product of brain-body coevolution. However, its anatomical structure shares numerous features with other species. In my work, I explore how the human hand meets the necessary and sufficient mechanical requirement for manipulation. This allows us to begin to distinguish and contrast the complementary contributions of anatomy and the nervous system in order to improve hand rehabilitation and suggest avenues to build better machines.
Speaker Biography
I attended Swarthmore College from 1984-88 where I obtained a BS degree in Engineering. After spending a year in the Indian subcontinent as a Thomas J Watson Fellow, I joined Queen's University in Ontario and worked with Dr. Carolyn Small. The research for my Masters Degree in Mechanical Engineering at Queen's focused on developing non-invasive methods to estimate the kinematic integrity of the wrist joint. In 1991 I joined the doctoral program in the Design Division of the Mechanical Engineering Department at Stanford University. I worked with Dr. Felix Zajac developing a realistic biomechanical model of the human digits. This research, done at the Rehabilitation R & D Center in Palo Alto, focused on predicting optimal coordination patterns of finger musculature during static force production. After completing my doctoral degree in 1997, I joined the core faculty of the Biomechanical Engineering Division at Stanford University as a Research Associate and Lecturer. My research then focused on developing experimental methods to optimize the surgical restoration of hand function following spinal cord injury and peripheral nerve injuries. In 1999 I joined the faculty of the Sibley School of Mechanical and Aerospace Engineering as an Assistant Professor. I also have close ties with the Hospital for Special Surgery in New York City.
Speaker Appointments
For appointments, please contact Jean Harpley(jean@cs.cmu.edu - 8-3802)
[Thesis Proposal] Policies based on Trajectory Libraries
Robotics Institute
Carnegie Mellon University
Place and Time
NSH 3305
10:00 AM
Abstract
I present a control approach that uses a library of trajectories to establish a global control law or policy. This is an alternative to methods for finding global policies based on value functions using dynamic programming and also to using plans based on a single desired trajectory. Our method has the advantage of providing reasonable policies much faster than dynamic programming can provide an initial policy. It also has the advantage of providing more robust and global policies than following a single desired trajectory. Trajectory libraries can be created for robots with many more degrees of freedom than what dynamic programming can be applied to as well as for robots with dynamic model discontinuities. Results are shown for the “Labyrinth” marble maze and the Little Dog quadruped robot. The marble maze is a difficult task which requires both fast control as well as planning ahead. In the Little Dog terrain, a quadruped robot has to navigate quickly across small-scale rough terrain. In our past work, I have used global state to represent the knowledge in the trajectory libraries. In order to broaden the use of a library, I propose the use of local state representations, which allow the knowledge represented by a library to be used in novel situations. Three different mechanisms for this transfer are proposed: Information about the goal of a task can be explicitly represented in the local state. Libraries using this representation can be transferred directly to new tasks. Alternatively, the local state representation might not include a goal feature. When using such a library, a search over actions in the library has to be used to pick actions that obtain the goal. Finally, one can cluster the actions in the library in order to create abstract actions. This will simplify the search process.
Further Details
A copy of the thesis proposal document can be found at http://gs3020.sp.cs.cmu.edu/~mstoll/proposal.pdf.
Thesis Committee
Christopher Atkeson, Chair
James Kuffner
Drew Bagnell
Riger Dillmann, University of Karlsruhe
Tuesday, September 26, 2006
Lab Meeting 29 Sep.,2006 (Chihao): 3D Sound Source Localization System Based on Learning of Binaural Hearing
Author: Hiromichi Nakashima, Toshiharu Mukai
This paper appears in: IEEE SMC 2005 (IEEE International Conference on Systems, Man, and Cybernetics)
Abstract:
We have thus far developed two types of sound source localization system, one of which can localize the horizontal direction and the other the vertical direction. These systems can acquire the localization ability by self-organization through repetition of movement and perception. In this paper, we report a newly built sound source localization system that can detect the direction of a sound source arbitrarily located in front of it. This system is composed of a robot that has two microphones with reflectors corresponding to human’s pinnas. To acquire the horizontal direction, the interaural time difference is used as the auditory cue. To acquire the vertical direction, the features on the audio spectrum induced by the reflectors are used as the auditory cue. The robot can establish the relationship between the cues and the sound direction through learning.
Link
Lab Meeting 29 Sep.,2006 (Vincent): Active Appearance Models
Author : Timothy F. Cootes, Gareth J. Edwards, and Christopher J. Taylor
Origin :
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 23, NO. 6, JUNE 2001
Abstact :
We describe a new method of matching statistical models of appearance to images. A set of model parameters control modes of shape and gray-level variation learned from a training set. We construct an efficient iterative matching algorithm by learning the relationship between perturbations in the model parameters and the induced image errors.
You can find the full article here.
CMU ML talk: Learning-based Deformable Neuroimage Registration
http://www.cs.cmu.edu/~leonid/
September 25
Abstract:
Deformable neuroimage registration is an active and challenging research area. It forms a crucial component of many computational and clinical neuroscience applications, including computer aided diagnosis, statistical quantification of human brain, and atlas-based neuroimage segmentation.
Maximizing the number of correctly estimated voxel correspondences enhances the accuracy of a deformable registration algorithm. Most existing feature-based deformable registration algorithms use a pre-defined set of image features to estimate correspondences for all voxels. These methods have two main weaknesses. First, the featurevector is constructed by the authors of the algorithms rather than automatically selected to minimize registration error. Second, the samefeature vector is used for all the voxels in the whole brain image, without consideration given to the inhomogeneity of the anatomical structures and their corresponding voxels.
We propose a new learning-based deformable registration algorithm that performs feature selection for every voxel. Our algorithm can be trained to accurately register specific anatomical structures as well as the entire neuroimages of specific patient groups. The main novelty of our approach is that it automatically learns feature vectors for distinguished individual image voxels, thus increasing correspondence estimation accuracy. Our method utilizes a decision theoretic approach to systematically calculate the expected correspondence estimation error for a voxel in many different feature spaces, and then select the space with the smallest error. Our feasibility study on the 2D midsagittal slices shows that learning feature subspace increases number ofcorrectly estimated correspondences by 20%.
We will quantitatively evaluate the performance of our deformable registration algorithm and apply it to several medical image analysis problems.
CMU Intelligence Seminar: Semantic Models of Shape
Jovan Popovic', CSAIL, MIT
Tuesday 9/26
Abstract: Conventional representations of shape (splines, meshes, etc.) provide general modeling controls without differentiating between real and meaningless outcomes. This burdens human operators and computational techniques with the task of searching through a vast and cluttered design space. Semantic representations clear up the clutter by attaching human understanding to computational representations of shape and motion.
Bio: Jovan Popovic' is an Associate Professor in the Department of Electrical Engineering and Computer Science, and a member of the Computer Graphics Group in the Computer Science and Artificial Intelligence Laboratory, at the Massachusetts Institute of Technology. Before arriving at MIT in the 2001, Jovan Popovic' received his Ph.D. in Computer Science from Carnegie Mellon University and his B.S. degrees in Mathematics and Computer Science from Oregon State University. His research employs computer science, mathematics and physics to explore the applications of geometric modeling and computer animation to the fields of computer graphics, human-computer interaction, biomechanics, robotics, and computational design.
Monday, September 25, 2006
Call for papers: JFR Special Issue: Safety, Security and Rescue Robots
We invite papers that exhibit state-of-the-art theory and methods applied to fielded studies including:
* novel locomotion mechanisms for rough terrain
* robotic sensors and sensing techniques for unstructured or semi-structured terrain
* novel human/robot interaction devices and paradigms for emergency response
* collaborative systems of land/sea/air vehicles for search/rescue/assessment
* lessons learned from robotic land/sea/air deployments in mines, collapsed structures, and wide area disasters
The complete call for papers for this special issue can be found at:
http://journalfieldrobotics.org/si.html
Please note that the deadline for submissions is November 1, 2006.
We look forward to your submission.
-Richard Voyles and Howie Choset
Sunday, September 24, 2006
News: Robot manufacturing to be Taiwan's next booming industry: MOEA
2006/9/23
TAIPEI, CNA
The research and development of artificial intelligence robots could help the total output value of Taiwan's machinery industry to top NT$1 trillion (US$30.4 billion) by 2009, with an annual growth rate of no less than 30 percent before 2012, the Industrial Development Bureau (IDB) under the Ministry of Economic Affairs said yesterday.
The IDB also predicted that by 2016, A.I. robot manufacture alone could generate an output value of NT$250 billion and an export value of NT$175 billion, as well as contribute at least 1.35 percent to Taiwan's GDP, adding that 22,000 jobs could be created.
Moreover, the IDB estimated that the global production value of the robot industry could exceed that of the automobile sector worldwide by 2020, reaching some US$1.4 trillion.
...... See the full article.
Saturday, September 23, 2006
Lab Meeting 29 Sep.,2006 (Ashin):Using GPS to learn significant locations and predict movement across multiple users
From:Personal and Ubiquitous Computing Volume 7, Number 5 / October, 2003
Abstract:
Wearable computers have the potential to act as intelligent agents in everyday life and assist the user in a variety of tasks, using context to determine howto act. Location is the most common form of context used by these agents to determine the user's task.However, another potential use of location context isthe creation of a predictive model of the user's future movements. We present a system that automatically clusters GPS data taken over an extended periodof time into meaningful locations at multiple scales.These locations are then incorporated into a Markovmodel that can be consulted for use with a variety ofapplications in both single user and collaborative scenarios.
[Link]
Thursday, September 21, 2006
IEEE news: How to manage an impending deluge of new data?
See "Data Torrents and Rivers," by Michael Stonebraker: the link
IEEE news: Sounding out IEEE's fellows
See "Bursting Tech Bubbles Before They Balloon," by Marina Gorbis and David Pescovitz: the link.
Wednesday, September 20, 2006
Lab meeitng 22 Sep., 2006(ZhenYu):Communication Robots for Elementary Schools
From: Proc. AISB'05 Symposium Robot Companions: Hard Problems and Open Challenges in Robot-Human Interaction, pp. 54-63, April 2005.
Abstract: This paper reports our approaches and efforts for developing communication robots for elementary schools. In particular, we describe the fundamental mechanism of the interactive humanoid robot, Robovie, for interacting with multiple persons, maintaining relationships, and estimating social relationships among children. The developed robot Robovie was applied for two field experiments at elementary schools. The first experiment purpose using it as a peer tutor of foreign language education, and the second was purposed for establishing longitudinal relationships with children. We believe that these results demonstrate a positive perspective for the future possibility of realizing a communication robot that works in elementary schools.
[Link]
Tuesday, September 19, 2006
Lab meeitng 22 Sep., 2006 (Eric): Dual Photography
From: ACM SIGGRAPH 2005 conference proceedings
Abstract: We present a novel photographic technique called dual photography,
which exploits Helmholtz reciprocity to interchange the lights
and cameras in a scene. With a video projector providing structured
illumination, reciprocity permits us to generate pictures from
the viewpoint of the projector, even though no camera was present
at that location. The technique is completely image-based, requiring
no knowledge of scene geometry or surface properties, and
by its nature automatically includes all transport paths, including
shadows, inter-refections and caustics. In its simplest form, the
technique can be used to take photographs without a camera; we
demonstrate this by capturing a photograph using a projector and
a photo-resistor. If the photo-resistor is replaced by a camera, we
can produce a 4D dataset that allows for relighting with 2D incident
illumination. Using an array of cameras we can produce a 6D
slice of the 8D re-refectance feld that allows for relighting with arbitrary
light felds. Since an array of cameras can operate in parallel
without interference, whereas an array of light sources cannot, dual
photography is fundamentally a more effcient way to capture such
a 6D dataset than a system based on multiple projectors and one
camera. As an example, we show how dual photography can be
used to capture and relight scenes.
[Link]
Lab meeitng 22 Sep., 2006 (Bright): A Novel System for Tracking Pedestrains Using Multiple Single-Row Laser-Range Scanners
From: IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND HUMANS, VOL. 35, NO. 2, MARCH 2005
Abstract: In this research, we propose a novel system for
tracking pedestrians in a wide and open area, such as a shopping
mall and exhibition hall, using a number of single-row laser-range
scanners (LD-A), which have a profiling rate of 10 Hz and a
scanning angle of 270 . LD-As are set directly on the floor doing
horizontal scanning at an elevation of about 20 cm above the
ground, so that horizontal cross sections of the surroundings,
containing moving feet of pedestrians as well as still objects, are
obtained in a rectangular coordinate system of real dimension.
The data of moving feet are extracted through background subtraction
by the client computers that control each LD-A, and sent
to a server computer, where they are spatially and temporally integrated
into a global coordinate system. A simplified pedestrian’s
walking model based on the typical appearance of moving feet is
defined and a tracking method utilizing Kalman filter is developed
to track pedestrian’s trajectories. The system is evaluated through
both real experiment and computer simulation. A real experiment
is conducted in an exhibition hall, where three LD-As are used
covering an area of about 60 x 60 m2. Changes in visitors’ flow
during the whole exhibition day are analyzed, where in the peak
hour, about 100 trajectories are extracted simultaneously. On the
other hand, a computer simulation is conducted to quantitatively
examine system performance with respect to different crowd
density.
Link
CMU thesis proposal: Scale Selection and Invariance in Low Level Vision
18 Sep 2006
The representation of objects through locally computed features is a concept common to many approaches in 2-D and 3-D computer vision. The use of local information to infer global properties aims to serve several purposes such as robustness to outlying structures, variation in viewing conditions, noise and occlusion. Reliable computation of relevant local attributes is thus an important part of any practical vision system intended to perform higher level reasoning.
The task of making such local observations necessitates making choices of the neighborhood size within which the computation is performed, also referred to as the /scale/ of the observation. This in turn poses several unanswered questions relevant to both data representation (e.g. reconstruction and compression) as well as data identification (e.g. object detection and classification). At what scale is it meaningful to compute a local feature? What is the optimal neighborhood size for estimating local geometric properties from data? While many advances have been made in a theory of scale for 2-D luminance images, little attention has been paid to the domains of unorganized point clouds (as would be acquired with a laser range scanner) or to alternate representations of images (such as color or other pixel-wise functions such as optical flow).
This thesis explores the problems of scale selection and invariance in previously unaddressed problem domains, and proposes solutions for several useful vision tasks:
* We propose to extend current application of scale theory for interest region extraction in 2-D images to alternate, potentially more useful representations. As an example, we demonstrate how both scale as well as illuminant invariant keypoint detection may be achieved in the case of color (RGB) images without having to estimate the properties of the illuminant.
* We present methods to robustly compute local differential properties from non-uniform unstructured point clouds. In particular, we show how data-driven adaptation of the neighborhood size in local PCA when computing tangents (normals) from spatial curves (surfaces) can make even this naive estimator more robust than leading fixed-scale alternatives.
* We propose the development of new scale-space representations of 3-D point cloud data that are robust to changes in sampling. By this, we advocate changing the current practice of using a single globally fixed value of scale when computing shape descriptors from 3-D data to that of using a value that is locally data-driven.
* We propose to investigate the application of local intrinsic scale detection for manifold learning. The aims of this analysis are improved statistical properties and robustness of embedding functions and regularizers to sampling variations in the dataset.
Overall, the expected contributions of this thesis are new technique and tools for the scale selection problem that is fundamental to local data analysis and learning from real-world measurements.
Further Details: A copy of the thesis proposal document can be found at the link.
CMU vasc talk: Visual Recognition and Tracking for Perceptive Interfaces
MIT CSAIL
http://people.csail.mit.edu/trevor/
Devices should be perceptive, and respond directly to their human user and/or environment. In this talk I'll present new computer vision algorithms for fast recognition, indexing, and tracking that make this possible, enabling multimodal interfaces which respond to users' conversational gesture and body language, robots which recognize common object categories, and mobile devices which can search using visual cues of specific objects of interest. As time permits, I'll describe recent advances in real-time human pose tracking for multimodal interfaces, including new methods which exploit fast computation of approximate likelihood with a pose-sensitive image embedding. I'll also present our linear-time approximate correspondence kernel, the Pyramid Match, and its use for image indexing and object recognition, and discovery of object categories. Throughout the talk, I'll show interface examples including grounded multimodal conversation as well as mobile image-based information retrieval applications based on these techniques.
BIO: Trevor Darrell is an Associate Professor of Electrical Engineering and Computer Science at M.I.T. He leads the Vision Interface Group at the Computer Science and Artificial Intelligence Laboratory. His interests include computer vision, interactive graphics, and machine learning. Prior to joining the faculty of MIT he worked as a Member of the Research Staff at Interval Research in Palo Alto, CA, researching vision-based interface algorithms for consumer applications. He received his PhD and SM from MIT in 1996 and 1991, respectively, while working at the Media Laboratory, and the BSE from the University of Pennsylvania in 1988, where he worked in the GRASP Robotics Laboratory.
CMU ML talks: UAI 2006 conference review
by Or Zuk, Shiri Margel and Eytan Domany
Bayesian Networks (BNs) are useful tools giving a natural and compact representation of joint probability distributions. In many applications one needs to learn a Bayesian Network (BN) from data. In this context, it is important to understand the number of samples needed in order to guarantee a successful learning. Previous works have studied BNs sample complexity, yet they mainly focused on the requirement that the learned distribution will be close to the original distribution which generated the data. In this work, we study a different aspect of the learning task, namely the number of samples needed in order to learn the correct structure of the network. We give both asymptotic results (lower and upper-bounds) on the probability of learning a wrong structure, valid in the large sample limit, and experimental results, demonstrating the learning behavior for feasible sample sizes.
2. Non-Minimal Triangulations for Mixed Stochastic/Deterministic Graphical Models
by Chris D. Bartels and Jeff A. Bilmes
We observe that certain large-clique graph triangulations can be useful for reducing computational requirements when making queries on mixed stochastic/deterministic graphical models. We demonstrate that many of these large clique triangulations are non-minimal and are thus unattainable via the elimination algorithm. We introduce ancestral pairs as the basis for novel triangulation heuristics and prove that no more than the addition of edges between ancestral pairs need be considered when searching for state space optimal triangulations in such graphs. Empirical results on random and real world graphs are given. We also present an algorithm and correctness proof for determining if a triangulation can be obtained via elimination, and we show that the decision problem associated with finding optimal state space triangulations in this mixed setting is NP-complete.
Monday, September 18, 2006
Lab meeting this Fall
When: 10:30 AM - 12:30 PM
Where: CSIE R424/426
Best,
-Bob
Sunday, September 17, 2006
CMU AI talk: An Axiomatic Approach to Ranking Systems
October 3, 2006
ABSTRACT: This talk will survey our recent work in applying the axiomatic approach to ranking systems. Ranking systems are systems in which agents rank each other to produce a social ranking. In the axiomatic approach we study ranking systems under the light of basic properties, or axioms. In this talk I will present our axiomatization theorem for the PageRank ranking system, prove an impossibility and possibility result for general ranking systems, and discuss the issue of incentives in ranking systems. Finally, I will show initial results regarding personalized ranking systems, where a specialized ranking is generated for each agent.
CMU AI talk: Cost-sensitive Classifier Evaluation Using Cost Curves
When: Monday, September 25, 2006 at 3:30p
Where: Newell Simon Hall 1305
The evaluation of classifier performance in a cost-sensitive setting is straightforward if the operating conditions (misclassification costs and class distributions) are fixed and known. When this is not the case, evaluation requires a method of visualizing classifier performance across the full range of possible operating conditions. This talk argues that the classic technique for classifier performance visualization -- the ROC curve -- is inadequate for the needs of researchers and practitioners in several important respects. It then describes a different way of visualizing classifier performance -- the cost curve -- that overcomes these deficiencies. No familiarity with ROC curves or cost curves is necessary, they will be fully explained. Joint work with Chris Drummond (National Research Council, Ottawa)
CMU FRC talk: Celestial Navigation for Localization of Planetary Rovers
Robotics Institute, Carnegie Mellon University
With new interest in placing rovers on the Moon as a precursor to human re-landing, there is a need to develop modern technology to support a landing and operating a semi-autonomous vehicle on the surface, with minimal support infrastructure. The challenge of localization and navigation on an atmosphere-less body without GPS architecture or relay satellite presents a unique opportunity to explore the benefits of celestial navigation.
Here we propose a new method to localize a vehicle on a planetary body using a standard spacecraft star tracker. This talk will first provide a look into the history of celestial navigation and spacecraft attitude control systems to introduce modern tools available for rover localization. Two different rover celestial localization schemes, StarGrav and a new wide field-of-view star tracker method, will be described and compared. A conceptual hardware design for a flight Lunar celestial localization system based on the wide FOV star tracker will then be presented.
Speaker Bio: Deborah Sigel is a PhD candidate in the FRC, working with David Wettergreen. Her research interests include development of robotic technology and methods to improve space and planetary exploration. She obtained an MS in Aerospace Engineering at University of Maryland, and BS in Aerospace and Mechanical Engineering at Rensselaer Polytechnic Institute. Her technical experience has included designing and building spaceflight astronaut hand tools for NASA's Return to Flight program, while at Swales Aerospace, to assist astronauts in space shuttle on-orbit repairs, used on flights STS-114 and STS-121. She has also worked at NASA JPL to design mechanical hardware for the Mars Exploration Rovers (Spirit and Opportunity) and Mars Science Laboratory rover.
CMU proposal: Volumetric Descriptors for Efficient Video Analysis
When: Wednesday, September 13, 09:30 a.m.
Where: 3305 Newell-Simon Hall
Abstract: The amount of digital video has grown exponentially in recent years. However, the technology for making intelligent searches on video has failed to keep pace. The question of how to efficiently represent video, optimized for retrieval, is still an open question. We make the key observation that objects in video span both space and time, and therefore 3D spatio-temporal volumetric features are natural representations forthem. The goal of this thesis to propose efficient volumetric representations for video and evaluate how well these representations perform in a wide range of applications. Example applications include video retrieval and action recognition. Our approach is divided into three main parts: spatio-temporal region extraction, volumetric region representations, and matching/recognition methods in video. We first use unsupervised clustering to extract an over-segmentation of the video volume. The regions loosely correspond to object boundaries in space-time. Next, we construct a volumetric representation for the regions and define a distance metric to match them. Finally, we learn models based on multiple templates of user-specified actions, such as tennis serves, running, dance moves, etc. We plan to evaluate the proposed method and compare against existing methods on a large video database.
Thesis Summary: the link.
Thursday, September 14, 2006
Lab Meeting 15 Sep., 2006 (Nelson) : nScan-Matching : Simultaneous Matching of Multiple Scans and Application to SLAM
Peter Biber Wolfgang Straßer
University of Tubingen
WSI/GRIS
Abstract—Scan matching is a popular way of recovering
a mobile robot’s motion and constitutes the basis of many
localization and mapping approaches. Consequently, a variety
of scan matching algorithms have been proposed in the past.
All these algorithms share one common attribute: They match
pairs of scans to obtain spatial relations between two robot poses.
In this paper we present a method for matching multiple scans
simultaneously. We discuss the need for such a method and
describe how the result of such a multi-scan matching can be
incorporated into relation-based SLAM in the Lu and Milios
style.
Wednesday, September 13, 2006
News: Creator of AIBO to launch dancing humanoid robot
The mostly white, 33-centimeter high, 1.5-kilogram MI RAI-RT robot will be priced at 294,000 yen. The company will start accepting orders via the Internet on Sept. 30.
See the full article.
News: Hitachi Develops Crowd-Navigating Robot
The robot moves at at 0.8m per second and uses a laser sensor to detect the distance of obstacles. During trials, the robot smoothly passed four people walking at speeds of up to 1.2m per second.
See the full article.
Monday, September 11, 2006
Lab meeitng 15 Sep., 2006 (Stanley): Lessons Learned in Integration for Sensor-Based Robot Navigation Systems
From: International Journal of Advanced Robotic Systems, Volume 3, no. 1, Pages 85-91, 2006
Abstract: This paper presents our work of integration during the last years within the context of sensor‐based robot navigation systems. In our motion system, as in many others, there are functionalities involved such as modeling, planning or motion control, which have to be integrated within an architecture. This paper addresses this problematic. Furthermore, we also discuss the lessons learned while: (i) designing, testing and validating techniques that implement the functionalities of the navigation system, (ii) building the architecture of integration, and (iii) using the system on several robots equipped with different sensors in different laboratories.
Link
Sunday, September 10, 2006
IBM: Open Source Robotics Toolkits
Saturday, September 09, 2006
CMU report: On the Beaten Path: Exploitation of Entities Interactions For Predicting Potential Link
tech. report CMU-RI-TR-06-36, Robotics Institute, Carnegie Mellon University, August, 2006.
Abstract: We propose a new non-parametric link analysis algorithm that predicts a potential link between entities given a set of different relational patterns. The proposed method first represents different types of relations among entities by constructing the corresponding number of factorized matrices from the original entity-by-relation matrices. The prediction of a possible link between entities is done by linearly summing the weighted distances in the latent spaces. A logistic regression is used to estimate regression coefficients of distances in the latent spaces. From the experimental comparisons with various algorithms, our algorithm performs best in precision and second-best in recall measure. (pdf)
CMU report: Combining multiple hypotheses for identifying human activities
tech. report CMU-RI-TR-06-31, Robotics Institute, Carnegie Mellon University, May, 2006.
Abstract
Dempster-Shafer theory is one of the predominant methods for combining evidence from different sensors. However, it has been observed that Dempster's rule of combination may yield inaccurate results in some situations. In this paper, we examine the properties and the performance of five different combination rules on a set of real world data. The data was obtained through biometric sensors from a number of human subjects. The problem we study is the prediction of the activity state of a human, given time series readings from the biometric sensors. (pdf)
CMU ML talks: the ICML 2006 Conference Review Session
1. Support Vector Decomposition Machine
by F Pereira and G Gordon
In machine learning problems with tens of thousands of features and only dozens or hundreds of independent training examples, dimensionality reduction is essential for good learning performance. In previous work, many researchers have treated the learning problem in two separate phases: first use an algorithm such as singular value decomposition to reduce the dimensionality of the data set, and then use a classification algorithm such as naive Bayes or support vector machines to learn a classifier. We demonstrate that it is possible to combine the two goals of dimensionality reduction and classification into a single learning objective, and present a novel and efficient algorithm which optimizes this objective directly. We present experimental results in fMRI analysis which show that we can achieve better learning performance and lower-dimensional representations than two-phase approaches can.
2. Inference with the Universum
by Jason Weston, Ronan Collobert, Fabian Sinz, Leon Bottou and Vladimir Vapnik
In this paper we study a new framework introduced by (Vapnik 1998) that is an alternative capacity concept to the large margin approach. In the particular case of binary classification, we are given a set of labeled examples, and a collection of "non-examples" that do not belong to either class of interest. This collection, called the Universum, allows one to encode prior knowledge by representing meaningful concepts in the same domain as the problem at hand. We describe an algorithm to leverage the Universum by maximizing the number of observed contradictions, and show experimentally that this approach delivers accuracy improvements over using labeled data alone.
3. Bayesian Multi-Population Haplotype Inference via a Hierarchical Dirichlet Process Mixture
by E.P. Xing, K. Sohn, M.I. Jordan and Y.W. Teh
Uncovering the haplotypes of single nucleotide polymorphisms and their population demography is essential for many biological and medical applications. Methods for haplotype inference developed thus far-including methods based on coalescence, finite and infinitemixtures, and maximal parsimony ignore the underlying population structure in the genotype data. As noted by Pritchard(2001), different populations can share certain portion of their genetic ancestors, as well as have their own genetic components through migration and diversification. In this paper, we address the problem of multi-population haplotype inference. We capture cross-population structure using a nonparametric Bayesian prior known as the hierarchical Dirichlet process (HDP) (Teh et al.,2006), conjoining this prior with a recently developed Bayesian methodology for haplotype phasing known as DP-Haplotyper (Xinget al., 2004). We also develop an efficient sampling algorithm for the HDP based on a two-level nested P?olya urn scheme. Weshow that our model outperforms extant algorithms on both simulated and real biological data.
News: Bringing Robot Transportation to Europe
Transportation planners have long dreamed of an age of driverless taxis that could help alleviate traffic in congested areas and that vision of driverless urban areas could soon become reality. Under the auspices of the European Union's "Citymobil" project, which was launched on August 28, companies and research institutes representing 10 countries have come together to develop small automatic transportation systems. Currently, three model projects are planned with funding of about €40 million.
See the full article.
News: Robot breakthrough brings fingertip feeling
September 09, 2006
A TOUCH sensor developed to match the sensitivity of the human finger is set to herald the age of the robotic doctor. Until now robots have been severely handicapped by their inability to feel with anything like the accuracy of their human creators.
The very best are unable to beat the dexterity of a six-year-old at knotting shoelaces or building a house of cards.
But all that could change with the development by nanotechnologists of a device that can "feel" the shape of a coin, down to the detail of the letters stamped on it.
The ability to feel with at least the same degree of sensitivity as a finger is crucial to the development of robots that can take on complicated tasks such as open heart surgery.
See the full article: the link
Tuesday, September 05, 2006
Atwood's Talk on the Lab meeing(Sep. 7): Integrated Person Tracking Using Stereo, Color and Pattern Detection
Abstract: We present an approach to real-time person tracking in crowded and/or unknown environments using
integration of multiple visual modalities.We combine stereo, color, and face detection modules into a single robust
system, and show an initial application in an interactive, face-responsive display. Dense, real-time stereo processing
is used to isolate users from other objects and people in the background. Skin-hue classification identifies and tracks
likely body parts within the silhouette of a user. Face pattern detection discriminates and localizes the face within
the identified body parts. Faces and bodies of users are tracked over several temporal scales: short-term (user stays
within the field of view), medium-term (user exits/reenters within minutes), and long term (user returns after hours
or days). Short-term tracking is performed using simple region position and size correspondences, while medium
and long-term tracking are based on statistics of user appearance. We discuss the failure modes of each individual
module, describe our integration method, and report results with the complete system in trials with thousands of users.
Source: International Journal of Computer Vision 37(2), 175–185, 2000 or
International Conference on Computer Vision, 1998
(meoscar)My Talk, Sep 7 2006: Visual-Hull Reconstruction from Uncalibrated and Unsynchronized Video Streams
Monday, September 04, 2006
Lab meeitng 7 Sep., 2006 (Vincent): Robust face detection with multi-class boosting
Author : Yen-Yu Lin @ sinica.tw
Tyng-Luh Liu @ sinica.tw
This paper is from CVPR2005
Abstract :
With the aim to design a general learning framework for detecting faces of various poses or under different lighting conditions, we are motivated to formulate the task as a classification problem over data of multiple classes. Specifically, our approach focuses on a new multi-class boosting algorithm, called MBHboost, and its integration with a cascade structure for effectively performing face detection. There are three main advantages of using MBHboost: 1) each MBH weak learner is derived by sharing a good projection direction such that each class of data has its own decision boundary; 2) the proposed boosting algorithm is established based on an optimal criterion for multi-class classification; and 3) since MBHboost is flexible with respect to the number of classes, it turns out that it is possible to use only one single boosted cascade for the multi-class detection. All these properties give rise to a robust system to detect faces efficiently and accurately.
Here is the link of the paper.
Sunday, September 03, 2006
News: Robot Can Taste Wine and Cheeses
A new robot that can taste wine and identify cheeses.
Ever got bored of sniffing cheeses and tasting wine? Why not have a robot do it for you!
Researchers at NEC and Mie University in Japan have designed a robot that can taste and identify dozens of various wines, cheeses, and hors d'oeuvres.
A green and white prototype robot with eyes, a head and mouth was unveiled last month. The robot's left arm is equipped with an infared spectrometer that fires off a beam of infared light when objects are placed up against the sensor. The reflected light is analyzed in real time to determine the object's chemical composition and even alert possible health issues (i.e. salty or fatty foods).
See the full article.
CNN news: Rolling robot takes the tunes with you
TOKYO, Japan (AP) -- The new Japanese robot Miuro turns an iPod music player into a dancing boombox-on-wheels.
The 14-inch-long machine from ZMP Inc. blares music as it rolls and twists from room to room. The robot, which looks like a ball popping out of an egg, has a speaker system from Kenwood Corp.
See the full article.
MIT talk: Simulating human behaviors: Instructions, models, and parameterized actions
Date: Thursday, September 7 2006
Host: Jovan Popovic, MIT - CSAIL - Computer Graphics Group
Relevant URL: http://www.cis.upenn.edu/~badler/
Abstract:
Recently there has been considerable maturation in understanding how to use computer graphics technology to portray 3D virtual human agents. Unlike the off-line, animator-intensive methods used in the special effects industry, such real-time agents are expected to exist and interact with us "live." They can be represent other people or function in a virtual environment as autonomous helpers, teammates, or adversaries enabling novel interactive educational and training applications. Real people and virtual humans should be able to interact and communicate non-verbally, intentionally or not, through facial expressions, eye gaze, and gesture. We study such issues, including consistent parameterizations for gesture and facial actions using movement observation principles and visual attention and perception models. We developed a Parameterized Action Representation (PAR) that embodies certain semantics of human action and allows an agent to act, plan, and reason about its actions or actions of others. PAR is also designed for instructing future behaviors for autonomous agents and aggregates, and for controlling animation parameters that can individualize embodied agents. Group behaviors are additionally conditioned on agent roles and interpersonal communications. We also design instruction presentation and execution systems to facilitate virtual task training. We just started new projects to author instructions by direct performance.
Thursday, August 31, 2006
Robotics: Science and Systems Conference - Responsive Robot Gaze to Interaction Partner
Abstract: Gaze is regarded as playing an important role in face-to-face communication, for example exhibiting one's attention and regulating turn-taking during conversation, and therefore has been one of central topics in several fields including psychology, human-computer and human-robot interaction studies. Although a lot of findings in psychology have encouraged the previous work in both human-computer and human-robot interaction studies, how to move the agent's gaze, including when to move it, has not been explored yet, and therefore is addressed in this study. The impression a person forms from an interaction is strongly influenced by the degree to which their partner's gaze direction is correlates with their own. In this paper, we propose methods of responsive robot gaze control and confirm their effect on the feeling of being looked at, which is considered to be the basis of impression conveyance with gaze, through face-to-face interaction
LINK
Robotics: Science and Systems Conference - A Probabilistic Exemplar Approach to Combine Laser and Vision for Person Tracking
Abstract:
This article presents an approach to person tracking that combines camera images and laser range data. The method uses probabilistic exemplar models, which represent typical appearances of persons in the sensor data by metric mixture distributions. Our approach learns such models for laser and for camera data and applies a Rao-Blackwellized particle filter in order to track a persons appearance in the data. The filter samples joint exemplar states and tracks the persons position conditioned on the exemplar states using a Kalman filter. We describe an implementation of the approach based on contours in images and laser point set features. Additionally, we show how the models can be learned from training data using clustering and EM. Finally, we give first experimental results of the method which show that it is superior to purely laser-based approaches for determining the position of persons in images.
Link
Sunday, August 27, 2006
Lab Meeting 31 August, 2006: practice talk for the robot competition
CMU featured project: Educational Robotics - Vehicles for Teaching and Learning
A related news:
University Publishes ARM Powered Robot Designs
IQ Online - Paris,France
Carnegie Mellon University's Mobile Robot Programming Lab in the US has published a Linux based robot design for the ARM powered robot. (See the full article)
Thursday, August 24, 2006
MIT PhD Thesis: Anthills Built to Order: Automating Construction with Artificial Swarms
Advisors: Gerald Sussman
Issue Date: 14-Aug-2006
Abstract: Social insects build large, complex structures, which emerge through the collective actions of many simple agents acting with no centralized control or preplanning. These natural systems motivate investigating the use of artificial swarms to automate construction or fabrication. The goal is to be able to take an unspecified number of simple robots and a supply of building material, give the system a high-level specification for any arbitrary structure desired, and have a guarantee that it will produce that structure without further intervention.In this thesis I describe such a distributed system for automating construction, in which autonomous mobile robots collectively build user-specified structures from square building blocks. The approach preserves many desirable features of the natural systems, such as considerable parallelism and robustness to factorslike robot loss and variable order or timing of actions. Further, unlike insect colonies, it can build particular desired structures according to a high-level design provided by the user. Robots in this system act without explicit communication or cooperation, instead using the partially completed structure to coordinate their actions. This mechanism is analogous to that of stigmergy used by social insects, in which insects take actions that affect the environment, and the environmental state influences further actions. I introduce a framework of "extended stigmergy" in which building blocks are allowed to store, process or communicate information. Increasing the capabilities of the building material (rather than of the robots) in this way increases the availability of nonlocal structure information. Benefits include significant improvements in construction speed and in ability to take advantage of the parallelism of the swarm.This dissertation describes system design and control rules for decentralized teams of robots that provably build arbitrary solid structures in two dimensions. I present a hardware prototype, and discuss extensions to more general structures, including those built with multiple block types and in three dimensions.
http://hdl.handle.net/1721.1/33791
News: Faster Mapping Speeds Up the Search for Oil
By Katherine Bourzac
With demand and prices so high for crude oil, petroleum companies are searching for new reservoirs deep below the ocean floor, in areas of more geological complexity. But drilling under the ocean is very expensive, so oil companies need to have as complete an understanding of the geology where they're drilling as possible.
Even armed with reams of seismic data about the Earth's subterranean features, though, making accurate maps of the geology underlying the ocean is a challenge. Now Shell is working with computer scientists at MIT to design algorithms that will allow them to more quickly and more accurately create maps of these underground areas.
Generating maps of the deep and complex areas now under exploration by oil companies can take several people many months, says Richard Sears, a visiting scientist from Shell at MIT. Regions under study may be hundreds of kilometers in area and several kilometers deep. Those working to create 3-D maps of these areas must process huge amounts of data.
See the full article.
CMU thesis oral: Spectral Rounding & Image Segmentation
29 Aug 2006
Abstract
The task of assigning labels to pixels is central to computer vision. In automatic segmentation an algorithm assigns a label to each pixel where labels connote a shared property across pixels (e.g. color, bounding contour, texture). Recent approaches to image segmentation have formulated this labeling task as partitioning a graph derived from the image. We use spectral segmentation to denote the family of algorithms that seek a partitioning by processing the eigenstructure associated with image graphs. In this thesis we analyze current spectral segmentation algorithms and explain their performance, both practically and theoretically, on the Normalized Cuts (NCut) criterion. Further, we introduce a novel family of spectral graph partitioning methods, spectral rounding, and apply them to image segmentation tasks. Edge separators of a graph are produced by iteratively reweighting the edges until the graph disconnects into the prescribed number of components. At each iteration a small number of eigenvectors with small eigenvalue are computed and used to determine the reweighting. In this way spectral rounding directly produces discrete solutions where as current spectral algorithms must map the continuous eigenvectors to discrete solutions by employing a heuristic geometric separator (e.g. k-means). We show that spectral rounding compares favorably to current spectral approximations on the NCut criterion in natural image segmentation. Quantitative evaluations are performed on multiple image databases including the Berkeley Segmentation Database. These experiments demonstrate that segmentations with improved NCut value (obtained using the SR-Algorithm) are more highly correlated with human hand- segmentations.
A copy of the thesis oral document can be found at http://www.cs.cmu.edu/~tolliver/ThesisDraft.pdf.
Wednesday, August 23, 2006
CNN news: Wireless robots may float above Earth
PALMDALE, California (AP) -- Bob Jones has a lofty idea for improving communications around the world: Strategically float robotic airships above Earth as an alternative to unsightly telecom towers on the ground and expensive satellites in space.
Jones, a former NASA manager, envisions a fleet of unmanned "Stratellites" hovering in the atmosphere and blanketing large swaths of territory with wireless access for high-speed data and voice communications.
The idea of using airships as communications platforms isn't new _ it was widely floated during the dot-com boom. It didn't really fly then, and Jones is the first to admit the latest venture is a gamble.
See the full article.
CMU thesis proposal: Unsupervised Predictive Object Discovery
August 28, 2006
Abstract
This thesis proposal presents a new data-driven computational framework for unsupervised learning of object models from video. This framework integrates object representation learning, image parsing, and inference into a coherent whole based on the principles of persistence, coherent covariation, and predictability of visual patterns associated with objects or object parts in dynamic visual scenes. Visual patterns in video are extracted and linked across frames by exploiting the tendency of objects to persist and change gradually in visual scenes. First, a multitude of visual pattern proposals are generated by a clustering process based on Gestalt rules. A particle filtering-based inference mechanism then uses the proposals to construct and refine hypotheses about what objects are present in the video. Hypotheses are judged based on their ability to predict future video events, and the best hypotheses are finally used to create new or refined object models. For improved robustness in feature and object identification and inference, the mechanism learns and employs representations that explicitly encode the temporal dynamics of visual patterns. The key insight of the approach is the use of prediction of “future” visual events to facilitate inference and to validate learned representations. This framework is inspired by principles and insights from cognitive neuroscience, and thus the mechanisms investigated are relevant to understanding the representational development of object models in the brain.
A copy of the thesis proposal document can be found at http://gs2040.sp.cs.cmu.edu/UPOD/.
CMU Thesis proposal: Occlusion Boundaries: From Low-Level Detection to High-Level Reasoning
28 August 2006
Abstract
While much focus in computer vision is placed on the processing of individual, static images, many applications actually offer video, or sequences of images, as input. The extra temporal dimension of the data allows the motion of the camera or the scene to be used in processing. In particular, this motion provides the opportunity to observe objects or surfaces occluding one another. While often considered a nuisance to be "handled," the boundaries of objects at which occlusion occurs can also be valuable sources of information about 3D scene structure and shape. Since most, if not all, computer vision techniques aggregate information spatially within a scene via smoothing, patches, or graphical models with neighborhood structures, information from different physical surfaces in the scene is invariably and erroneously considered together. The low-level ability to locally detect occlusion through motion, then, should benefit many different vision techniques.
To this end, we propose to use our existing low-level occlusion detector, based on local reasoning about moving edges and the patches of data on either side of them, to find those edges in a scene which show evidence of being occlusion boundaries. We will also propose tackling this problem with a learned discriminative classifer, using the same motion features. Taking uncertainty into account, we will then propagate this local, low-level information more globally using random field methods or a confidence-based hysteresis thresholding approach. With extended occlusion boundaries available, we can then develop methods for incorporating that information into existing feature-based object recognition techniques, including our own Background and Scale Invariant Feature Transform (BSIFT). Leveraging existing techniques as a foundation, we also propose the use of these boundaries in generic object detection and segmentation, which may be advantageous for unsupervised detection and learning of novel objects in general environments.
This thesis therefore seeks to contribute to both the low- and high- level aspects of reasoning about occlusion:
- We will develop and compare a novel model-based detector and a learned discriminative classifier for extracting local occlusion boundaries in short video clips, both based on local motion features.
- We will show how to use occlusion boundary information to benefit the high-level tasks of feature-based object recognition and object detection/segmentation, possibly for unsupervised learning of object models.
We have existing work completed at either end of the spectrum (model- based detection and boundary-respecting recognition). Future work includes improvements to each, the connection of the two, and further research on the segmentation and learning tasks.
A copy of the thesis proposal document can be found at http://www.andrewstein.net/proposall.pdf.
(Leo)My Talk, Aug 24 2006: 3-D Localization and Mapping Using a Single Camera Based on Structure-from-Motion with Automatic Baseline Selection
Proceeding of the 2005 IEEE
International Conference on Robotics and Automation
Barcelona, Spain, April 2005
Author: Tomono, M.
Abstract: This paper presents a system of 3-D simultaneous localization and mapping (SLAM) using monocular vision-based on the structure-from-motion scheme. An crucial issue in applying structure-from-motion to SLAM is that accuracy depends heavily on the baseline distance. We address this problem by selecting an appropriate baseline based on criteria for the tradeoff between the baseline distance and the number of feature points visible in the images. Experimental results show that full 3-D sparse maps with camera trajectory were built from images captured with a handy camera.
PDF file: [link] (from IEEEXplore)
CMU thesis proposal: Incorporating unsupervised image segmentation into object class detection and localization
28 Aug 2006
Abstract
As the performance of object recognition and localization systems improves, there is increasing demand for their application to problems which require an exact pixel-level object mask. Photograph post-processing and robot-object interaction are just two examples of applications which require knowledge of exactly which pixels in an image are part of a specific object, and which ones are not. Traditional object recognition systems which generate bounding boxes around the found objects are inappropriate for these applications. The point- and patch-based features that these systems use are also ill-suited to delineating an object mask for a highly deformable object. Thus we propose to explore a framework for using segmentation regions for object learning and recognition. Image segmentation regions have a data-driven shape, so they can adapt to object boundaries well. In fact, if the right set of regions is grouped together, the entire object can be defined. In this proposal we will examine the issues which accompany using segmentation regions for recognition, namely:
- describing segmentation regions in a reliable and discriminative manner,
- grouping over-segmented regions together for more robust recognition and complete object segmentation, and
- within the context of the above framework, generating multiple segmentations per image to overcome the inherent ambiguity in unsupervised segmentation.
Since obtaining training data with hand-segmented objects is extremely expensive, we propose to use semi-supervised training data for which only image-level object labels are known but the pixels themselves are not labeled. Upon completion of the items in this proposal, we will have a better understanding of the issues related to performing object recognition and localization for such demanding applications.
A copy of the thesis proposal document can be found at http://gs2051.sp.cs.cmu.edu/proposal.pdf.
News: The Robots Are Coming!
The robots are on the move--leaping, scrambling, rolling, flying, climbing. They are figuring out how to get here on their own. They come to help us, protect us, amuse us--and some even do floors.
Since Czech playwright Karel Capek popularized the term ("robota" means "forced labor" in Czech) in 1921, we have imagined what robots could do. But reality fell short of our plans: Honda Motor (nyse: HMC - news - people ) trotted out its Asimo in 2000, but for now it's been relegated to temping as a receptionist at Honda and doing eight shows a week at Disneyland. The majority of the world's robots are bolted to a spot on a factory floor, sentenced to a repetitive choreography of welding, stamping and cutting.
...
Learning has been key, both for robots and for their designers. Carnegie Mellon's Robotics Institute has been an incubator for much of the current work on robots. Rodney Brooks of the Massachusetts Institute of Technology nudged the whole field forward in early 1990s when he showed how robots could make faster decisions by responding to sensory data from their immediate environment rather than relying on complex sets of rules.
...
Tandy Trower, general manager of Microsoft's robotics group, says robotics today reminds him of the early days of the PC--chock-full of ideas, opportunities and too many different operating systems.
Unlike PCs, however, robots are calling on the ingenuity of people from wildly diverse backgrounds: biologists are teaching robots to move, entertainers are teaching them how to amuse us, statisticians are teaching them when to ignore data, computer scientists are teaching them how to think, and materials scientists are inventing new composites that make them light on their feet.
Robots are about to be unshackled from forced labor. Expect them everywhere.
Sunday, August 20, 2006
(Casey)My talk, 24 August 2006 :Learning generative visual models from few training examples: an incremental Bayesian approach tested on 101 object...
Author: L. Fei-Fei, R. Fergus, and P. Perona.
Title: Learning generative visual models from few training examples: an incremental Bayesian approach tested on 101 object categories.(CVPR 2004, Workshop on Generative-Model Based Vision.)
Author: L. Fei-Fei, R. Fergus, and P. Perona.
Abstract: Current computational approaches to learning visual
object categories require thousands of training images,
are slow, cannot learn in an incremental manner and cannot
incorporate prior information into the learning process. In
addition, no algorithm presented in the literature has been tested
on more than a handful of object categories. We present an
method for learning object categories from just a few training
images. It is quick and it uses prior information in a principled
way. We test it on a dataset composed of images of objects
belonging to 101 widely varied categories. Our proposed method
is based on making use of prior information, assembled from
(unrelated) object categories which were previously learnt. A
generative probabilistic model is used, which represents the shape
and appearance of a constellation of features belonging to the
object. The parameters of the model are learnt incrementally
in a Bayesian manner. Our incremental algorithm is compared
experimentally to an earlier batch Bayesian algorithm, as well
as to one based on maximum-likelihood. The incremental and
batch versions have comparable classification performance on
small training sets, but incremental learning is significantly
faster, making real-time learning feasible. Both Bayesian methods
outperform maximum likelihood on small training sets.
PDF file: [Link]
You can download other Li fei-fei's paper in this link: [Link]
Saturday, August 19, 2006
CNN: A shopping cart that doesn't run into things!
Friday, August 18, 2006; Posted: 4:33 p.m. EDT (20:33 GMT)
GAINESVILLE, Florida (AP) -- It looks almost like any other shopping cart, except sensors let it follow the shopper around the supermarket and slow down when needed so items can be placed in it. And it never crashes into anyone's heels.
"The immediate thing that jumped to my mind was all those times as a kid when my sister would accidentally hit me with a cart," said its inventor, Gregory Garcia. "It seems like the public would really want this, since everybody shops."
See the full article.Friday, August 18, 2006
News: Robot team-mates tap into each others' talents
NewScientist.com news service
Tom Simonite
Teams of robots that can remotely tap into each other's sensors and computers in order to perform tricky tasks have been developed by researchers in Sweden. The robots can, for example, negotiate their way past awkward obstacles by relaying different viewpoints to one another.
Robert Lundh, who developed the bots at Örebro University, says cooperative behaviour is normally rigidly pre-programmed into robots. "We wanted to have the robots plan for themselves how to draw on their capabilities and those of others," he told New Scientist.
Lundh's robots decide whether another nearby robot may be able to help with a specific task. In one experiment two round robots, each 45 centimetres in diameter and 25 cm tall, teamed up to negotiate their way through a doorway. They were forced to cooperate because each robot's vision system had been limited so that it could not see enough of the doorway to be certain of getting through without hitting the sides.
See the full article.
Wednesday, August 16, 2006
What's New @ IEEE in Computing, August 2006
Before humans design robots with a moral capacity, we should decide exactly what that capacity should be, and to whom it should apply, according to Christopher Grau, assistant professor of philosophy at Florida International University. Writing in his article "There Is No 'I' in 'Robot': Robots and Utilitarianism," published in "IEEE Intelligent Systems" magazine, Grau uses the 2004 film "I, Robot" as a philosophical springboard to discuss the implications of utilitarianism, an ethical theory that requires moral agents to pursue actions that will maximize overall happiness. When faced with various possible actions, a utilitarian does what will produce the greatest net happiness, considering the happiness and suffering of all those affected by the action. Grau believes it is possible that sentient robots will be able to make utilitarian calculations, but that those calculations can sometimes be reduced to an "ends justifies the means" philosophy that is morally repugnant to humans. He says, however, that utilitarian moral theory might provide the best ethical theory for artificial agents that lack the boundaries of self that normally make utilitarian calculation inappropriate. Read more (PDF): the link
2. TOPIC MODELING SPEEDS UP THE SEARCH
A new technology developed by researchers at the University of California, Irvine (USA), called Topic Modeling allows people to locate topic-specific information from computerized newspaper text. The process involves looking for patterns of words that tend to occur together in documents, then automatically categorizing those words into topics. Before this, people searching for information had to enter the topic itself (or something closely related). For example, the researchers entered the words "Lance Armstrong," "Bike," "Race," and "Rider," and the program categorized it all under "Tour de France." Previously, looking for information this way was referred to as supervised learning, and involved many man hours. The researchers presented their finding recently at the IEEE Intelligence and Security Informatics Conference, and speculate that it will make retrieving information easier and quicker. Read more:
http://www.primidi.com/2006/07
Tuesday, August 15, 2006
Lab Meeting 17 August, 2006 (Yu-Chun): Function meets style: insights from emotion theory applied to HRI
Massachusetts Inst. of Technol., Cambridge, MA, USA;
2004 IEEE SMC Transactions, Part C.
Abstract:
As robot designers, we tend to emphasize the cognitive aspect of intelligence when designing robot architectures while viewing the affective aspect with skepticism. However, scientific studies continue to reveal the deeply intertwined and complementary roles that cognition and emotion play in intelligent decision-making, planning, learning, attention, communication, social interaction, memory, and more. Such findings provide valuable insights and lessons for the design of autonomous robots that must operate in complex and uncertain environments and perform in cooperation with people. This paper presents a concrete implementation of how these insights have guided our work, focusing on the design of sociable autonomous robots that interact with people as capable partners.
[Link]
Sunday, August 13, 2006
CNN News: Sink-or-swim robot race
Friday, August 11, 2006; Posted: 3:13 p.m. EDT (19:13 GMT)
SAN DIEGO, California (AP) -- Facing an exodus of institutional brain power as baby-boomer scientists retire, the Navy is turning to a younger pool of talent for its underwater robotics program.
As part of the effort, college students were recently invited to build robots that could perform a series of tasks without human control in a 38-foot deep research pool. The culmination, last weekend's International Autonomous Underwater Vehicle Competition, was a sink-or-swim contest.
The robots were required to swim through a gate, find and dock with a flashing light box, locate and tag a cracked pipeline, then home in on an acoustic beacon and resurface in a designated recovery zone. Top prize was $7,000 and serious bragging rights.
See the full article.
Friday, August 11, 2006
CMU RI Thesis Oral: Exploiting Spatial-temporal Constraints for Interactive Animation Control
14 Aug 2006
Interactive control of human characters would allow the intuitive control of characters in computer/video games, the control of avatars for virtual reality, electronically mediated communication or teleconferencing, and the rapid prototyping of character animations for movies. To be useful, such a system must be capable of controlling a lifelike character interactively, precisely, and intuitively. Building an animation system for home use is particularly challenging because the system should also be low-cost and not require a considerable amount of time, skill, or artistry.
This thesis explores an approach that exploits a wide range of spatial-temporal constraints for interactive animation control. The control inputs from such a system are often low-dimensional, contain far less information than actual human motion, and thus cannot be directly used for precise control of high-dimensional characters. However, natural human motion is highly constrained; the movements of the degrees of freedom of the limbs or facial expressions are not independent.
Our hypothesis is that the knowledge about natural human motion embedded in a domain-specific motion capture database can be used to transform the underconstrained user input into realistic human motions. The spatial-temporal coherence embedded in the motion data allows us to control high-dimensional human animations with low- dimensional user input.
We demonstrate the power and flexibility of this approach through three different applications: controlling detailed three-dimensional (3D) facial expressions using a single video camera, controlling complex 3D full-body movements using two synchronized video cameras and a very small number of retro-reflective markers, and controlling realistic facial expressions or full-body motions using a sparse set of intuitive constraints defined throughout the motion. For all three systems, we assess the quality of the results by comparisons with those created by a commercial optical motion capture system. We demonstrate the quality of the animation created by all three systems is comparable to commercial motion capture systems but requires less expense, time, and space to suit up the user.
Further Details: A copy of the thesis oral document can be found at http://www.cs.cmu.edu/~jchai/thesis/chai-defense.pdf.
MIT Thesis Defense: Learning with Online Constraints: Shifting Concepts and Active Learning
Speaker: Claire Monteleoni , MIT CSAIL
Date: Friday, August 11 2006
Time: 2:00PM to 3:00PM
Host: Tommi Jaakkola, MIT CSAIL
Contact: Claire Monteleoni, cmontel@csail.mit.edu
Relevant URL: http://people.csail.mit.edu/cmontel
Many practical problems such as forecasting, real-time decision making, streaming data applications, and resource-constrained learning, can be modeled as learning with online constraints. This thesis is concerned with analyzing and designing algorithms for learning under the following online constraints: 1) The algorithm has only sequential, or one-at-time, access to data. 2) The time and space complexity of the algorithm must not scale with the number of observations. We analyze learning with online constraints in a variety of settings, including active learning. The active learning model is applicable in any domain in which unlabeled data is easy to come by and there exists a (potentially difficult or expensive) mechanism by which to attain labels.
We present the following algorithms, performance guarantees, and applications for learning with online constraints. In a supervised learning framework in which observations are assumed to be iid, we lower bound the mistake-complexity of Perceptron, a standard online learning algorithm, and then provide a modified update that avoids this lower bound, attaining the optimal mistake-complexity for the problem in question. In an analogous active learning framework, our lower bound applies to the label-complexity of Perceptron paired with any active learning rule. We provide a new online active learning algorithm that avoids this lower bound, and we upper bound its label-complexity. The upper bound is optimal and also bounds the algorithm's total errors (labeled and unlabeled). We analyze the algorithm further, yielding a label-complexity bound under relaxed assumptions, and we perform an empirical evaluation on problems in optical character recognition. Finally, in a supervised learning framework involving no statistical assumptions on the observation sequence, we provide a lower bound on regret for a class of shifting algorithms. We apply an algorithm we provided in previous work, that avoids this lower bound, to an energy-management problem in wireless networks, and demonstrate this application in a network simulation.
Thesis Committee:
Tommi Jaakkola, MIT CSAIL (Thesis Supervisor)
Piotr Indyk, MIT CSAIL
Sanjoy Dasgupta, UC San Diego
News: Educational Robotics
Tuesday, August 08, 2006
Lab Meeting 10 August,2006(Ashin):PdaDriver: A Handheld System for Remote Driving
Abstract:
PdaDriver is a Personal Digital Assistant (PDA) system for vehicle teleoperation. It is designed to be easy-to-deploy, to minimize the need for training, and to enable effective remote driving through multiple control modes. This paper presents the motivation for PdaDriver, its current design, and recent outdoor tests with a mobile robot.
[Link]
CNN news: Video cameras on the lookout for terrorists
NISKAYUNA, New York (AP) -- It sounds like something out of science fiction.
Researchers at General Electric Co.'s sprawling research center, are creating new "smart video surveillance" systems that can detect explosives by recognizing the electromagnetic waves given off by objects, even under clothing.
Scientist Peter Tu and his team are also developing programs that can recognize faces, pinpoint distress in a crowd by honing in on erratic body movements and synthesize the views of several cameras into one bird's eye view, as part of a growing effort to thwart terrorism.
See the full article.Saturday, August 05, 2006
MIT defense: Algorithms for Data Mining
Date: Monday, August 7 2006
Time: 1:00PM
Data of massive size are now available in a wide variety of fields and come with great promise. In theory, these massive data sets allow data mining and exploration on a scale previously unimaginable. However, in practice, it can be difficult to apply classic data mining techniques to such massive data sets due to their sheer size.
In this thesis, we study three algorithmic problems in data mining with consideration to the analysis of massive data sets. Our work is both theoretical and experimental -- we design algorithms and prove guarantees for their performance and also give experimental results on real data sets. The three problems we study are: 1) finding a matrix of low rank that approximates a given matrix, 2) clustering high-dimensional points into subsets whose points lie in the same subspace, and 3) clustering objects by pairwise similarities/distances.
New Scientist magazine 4 August 2006
A lens has been developed that alters its focal length when squeezed by an artificial muscle in response to environmental changes.
Virtual bots teach each other using wordplay
The same technique could enable real-life robots to cooperate more effectively when faced with a new challenge
Software meshes photos to create 3D landscape
Overlapping image areas are identified and used to determine how images should be displayed in 3D environment
Wednesday, August 02, 2006
AAAI 2007 Spring Symposia
The American Association for Artificial Intelligence, in cooperation with Stanford University's Computer Science Department, is pleased to present its 2007 Spring Symposium Series, to be held Monday through Wednesday, March 26-28, 2007 at Stanford University in Stanford, California. The topics of the nine symposia in this symposium series are:
- Control Mechanisms for Spatial Knowledge Processing in Cognitive / Intelligent Systems
Holger Schultheis (schulth@informatik.uni-bremen.de) - Game Theoretic and Decision Theoretic Agents
Piotr Gmytrasiewicz (piotr@cs.uic.edu) - Intentions in Intelligent Systems
George Ferguson (ferguson@cs.rochester.edu) - Interaction Challenges for Artificial Assistants
Neil Yorke-Smith (nysmith@ai.sri.com) - Logical Formalizations of Commonsense Reasoning
Vladimir Lifschitz (vl@cs.utexas.edu) - Machine Reading
Oren Etzioni (etzioni@cs.washington.edu) - Multidisciplinary Collaboration for Socially Assistive Robotics
Marek Michalowski(michalowski@cmu.edu) and Adriana Tapus (tapus@robotics.usc.edu) - Quantum Interaction
William Lawless (lawlessw@mail.paine.edu) - Robots and Robot Venues: Resources for AI Education
Zachary Dodds (dodds@cs.hmc.edu)Submission Date
Submissions for the symposia are due on October 6, 2006. Notification of acceptance will be given by November 3, 2006. Material to be included in the working notes or technical report of the symposium must be received by January 26, 2007.
Please see the appropriate section in each symposium description for specific submission requirements.