Monday, December 31, 2007
Robotics Institute, Carnegie Mellon University,
Robot localization in outdoor environments is a challenging problem because of unstructured terrains. Ladars that are not horizontally attached have benefits for detecting obstacles but are not suitable for some localization algorithms used for indoor robots, which have horizontally fixed ladars. The data obtained from tilted ladars are 3D while these from non-tilted ladars are 2D. We present a 2D localization approach for these non-horizontally attached ladars. This algorithm combines 2D particle filter localization with a 3D perception system. We localize the vehicle by comparing a local map with a previously known map. These maps are created by converting 3D data into 2D data. Experimental results show that our approach is able to utilize the benefits of 3D data and 2D maps to efficiently overcome the problems of outdoor environments.
See the complete thesis.
Sunday, December 30, 2007
Acoustical Awareness for Intelligent Robotic Action,
PhD Thesis, College of Computing,
Georgia Institute of Technology,
With the growth of successes in pattern recognition and signal processing, mobile robot applications today are increasingly equipping their hardware with microphones to improve the set of available sensory information. However, if the robot, and therefore the microphone, ends up in a poor location acoustically, then the data will remain noisy and potentially useless for accomplishing the required task. This is compounded by the fact that there are many bad acoustic locations through which a robot is likely to pass, and so the results from auditory sensors often remain poor for much of the task.
The movement of the robot, though, can also be an important tool for overcoming these problems, a tool that has not been exploited in the traditional signal processing community. Robots are not limited to a single location as are traditionally placed microphones, nor are they powerless over to where they will be moved as with wearable computers. If there is a better location available for performing its task, a robot can navigate to that location under its own power. Furthermore, when deciding where to move, robots can develop complex models of the environment. Using an array of sensors, a mobile robot can build models of sound flow through an area, picking from those models the paths most likely to improve performance of an acoustic application.
In this dissertation, we address the question of how to exploit robotic movement. Using common sensors, we present a collection of tools for gathering information about the auditory scene and incorporating that information into a general framework for acoustical awareness. Thus equipped, robots can make intelligent decisions regarding control strategies to enhance their performance on the underlying acoustic application.
The full thesis.
Friday, December 28, 2007
Sorry for the inconvenience.
Wednesday, December 19, 2007
These innovations are being incorporated in two new robotic vehicles equipped for autonomous driving in urban environments, with extensive testing on a DARPA site visit course. Experimental results demonstrate all basic navigation and some basic traffic behaviors, including unoccupied autonomous driving, lane following using purepursuit control and our local frame perception strategy, obstacle avoidance using kinodynamic RRT path planning, Uturns, and precedence evaluation amongst other cars at intersections using our situational interpreter. We are working to extend these approaches to advanced navigation and traffic scenarios.
Tuesday, December 18, 2007
Monday, December 17, 2007
Lab Meeting December 18th, 2007(ZhenYu):Spherical Catadioptric Arrays: Construction, Multi-View Geometry, and Calibration
Author: Lanman, Douglas Crispell, Daniel Wachs, Megan Taubin, Gabriel
Proceedings of the Third International Symposium on 3D Data Processing, Visualization, and Transmission (3DPVT'06)
This paper introduces a novel imaging system composed of an array of spherical mirrors and a single high-resolution digital camera. We describe the mechanical design and construction of a prototype, analyze the geometry of image formation, present a tailored calibration algorithm, and discuss the effect that design decisions had on the calibration routine. This system is presented as a unique platform for the development of efficient multi-view imaging algorithms which exploit the combined properties of camera arrays and non-central projection catadioptric systems. Initial target applications include data acquisition for image-based rendering and 3D scene reconstruction. The main advantages of the proposed system include: a relatively simple calibration procedure, a wide field of view, and a single imaging sensor which eliminates the need for color calibration and guarantees time synchronization.
In this chapter, we will describe a statistical model that conforms to the
maximum entropy principle (we will call it the maximum entropy model, or
ME model in short) [68, 69]. Through mathematical derivations, we will show
that the maximum entropy model is a kind of exponential model, and is a close
sibling of the Gibbs distribution described in Chap. 6. An essential difference
between the two models is that the former is a discriminative model, while
the latter is a generative model. Through a model complexity analysis, we will
show why discriminative models are generally superior to generative models in
terms of data modeling power. We will also describe the Conditional Random
Field (CRF), one of the latest discriminative models in the literature, and
prove that CRF is equivalent to the maximum entropy model.
And I will try to point out some problems of EKF-SLAM and some limitations about it.
Tuesday, December 11, 2007
Lab Meeting 11 December (Der-Yeuan): Registration of Colored 3D Point Clouds with a Kernel-based Extension to the Normal Distributions Transform
We present a new algorithm for scan registration
of colored 3D point data which is an extension to the Normal
Distributions Transform (NDT). The probabilistic approach of
NDT is extended to a color-aware registration algorithm by
modeling the point distributions as Gaussian mixture-models
in color space. We discuss different point cloud registration
techniques, as well as alternative variants of the proposed algorithm.
Results showing improved robustness of the proposed
method using real-world data acquired with a mobile robot and
a time-of-flight camera are presented.
Authors: Benjamin Huhle, Martin Magnusson, Achim, Lilienthal, Wolfgang, Straßer
Reference on NDT: http://citeseer.ist.psu.edu/biber03normal.html
Thursday, December 06, 2007
Pizza will be served
Saturday, December 01, 2007
Things to be concerned:
1. Press the red bottom at the top of the door to enable/disable the door safety lock.
2. Keep the door closed and lock on usually. (At this time, no need the original physical door lock, remember to disable the physical door lock)
3. The last person who leaves the lab need enable the physical door lock as before.
(4. The first person who enters the lab need remember to use both their key and card. If it is very annoying, maybe we could try to disable the safety lock when condition 3. occurs)
A 6-foot-tall, one-armed robot named Stair 1.0 balances on a modified Segway platform in the doorway of a Stanford University conference room. It has an arm, cameras and laser scanners for eyes, and a tangle of electrical intestines stuffed into its base.
To do real work in our offices and homes, to fetch our staplers or clean up our rooms, robots are going to have to master their hands. They'll need the kind of "hand-eye" coordination that enables them to identify targets, guide their mechanical mitts toward them, and then manipulate the objects deftly.
But the next generation, Stair 2.0, will actually analyze its own actions. The next Stair will look for the object in its hand and measure the force its fingers are applying to determine whether it's holding anything. It will plan an action, execute it, and observe the result, completing a feedback loop. And it will keep going through the loop until it succeeds at its task.
For detail: Link
Friday, November 30, 2007
Robots’ interaction with humans raises new issuesfor geometrical reasoning where the humans must be taken explicitly into account. We claim that a human-aware motion system must not only elaborate safe robot motions, but also synthesize good, socially acceptable and legible movement.
This paper focuses on a manipulation planner and a placement mechanism that take explicitly into account its human partners by reasoning about their accessibility, their vision field and their preferences. This planner is part of a human-aware motion and manipulation planning and control system that we aim to develop in order to achieve motion and manipulation tasks in presence or in synergy with humans.
Tuesday, November 27, 2007
Title: Activity Recognition from Wearable Sensors
Date: Nov 29
Speaker:Dieter Fox is Associate Professor and Director of the Robotics and State Estimation Lab in the Computer Science & Engineering Department at the University of Washington, Seattle. He obtained his Ph.D. from the University of Bonn, Germany. Before joining UW, he spent two years as a postdoctoral researcher at the CMU Robot Learning Lab.
Dieter's research focuses on probabilistic state estimation with applications in robotics and activity recognition.
Recent advances in wearable sensing and computing devices and in fast, probabilistic inference techniques make possible the fine-grained estimation of a person's activities over extended periods of time. In this talk I will show how dynamic Bayesian networks and conditional random fields can be used to estimate the location and activity of a person based on information such as GPS readings or WiFi signal strength. Our models use multiple levels of abstraction to bridge the gap between raw sensor measurements and high level information such as a user's mode of transportation, her current goal, and her significant places (e.g. home or work place). I will also present work on using RFID tags or a wearable multi-sensor system to estimate a person's fine-grained activities.
This is joint work with Brian Ferris, Lin Liao, Don Patterson, Amarnag Subramanya, Jeff Bilmes, Gaetano Borriello, and Henry Kautz.
Monday, November 26, 2007
Vail, Douglas Carnegie Mellon Univ.
Lafferty, John Carnegie Mellon Univ.
Veloso, Manuela Carnegie Mellon Univ.
Temporal classification, such as activity recognition,
is a key component for creating intelligent robot systems.
In the case of robots, classification algorithms must robustly
incorporate complex, non-independent features extracted from
streams of sensor data. Conditional random fields are discriminatively
trained temporal models that can easily incorporate
such features. However, robots have few computational
resources to spare for computing a large number of features
from high bandwidth sensor data, which creates opportunities
for feature selection. Creating models that contain only the most
relevant features reduces the computational burden of temporal
classification. In this paper, we show that l1 regularization is an
effective technique for feature selection in conditional random
fields. We present results from a multi-robot tag domain with
data from both real and simulated robots that compare the
classification accuracy of models trained with l1 regularization,
which simultaneously smoothes the model and selects features;
l2 regularization, which smoothes to avoid over-fitting, but
performs no feature selection; and models trained with no
Sunday, November 25, 2007
Monday, Nov 26, 3:30pm, NSH 1507
Current object recognition systems can only recognize a limited number of object categories; scaling up to many categories is the next challenge inobject recognition. We seek to build a system to recognize and localize many different object categories in complex scenes. We achieve thisthrough a deceptively simple approach: by matching the input image, in anappropriate representation, to images in a large training set of labeled images. This gives us a set of retrieval images, which provide hypothesesfor object identities and locations. We combine this knowledge from theretrieval images with an object detector to detect objects in the image. The simplicity of the approach allows learning for a large number ofobject classes embedded in many different scenes. We demonstrate improvedclassification and localization performance over a standard objectdetector using a held-out test set from the Label Me database.Furthermore, our system restricts the object search space and therefore greatly increases computational efficiency.
After leaving sunny Phoenix, AZ, Bryan received his A.B. from DartmouthCollege. He recently defended his dissertation "Labeling, Discovering,and Detecting Objects in Images" at MIT under the supervision of WilliamFreeman and Antonio Torralba. His next journey will be as a post-doctoral fellow at Ecole Normale Supérieure under Jean Ponce and Andrew Zisserman.There, he will continue to pursue research in visual object recognitionand scene understanding.
Saturday, November 24, 2007
Bertrand Douillard, Dieter Fox, Fabio Ramos
This paper presents a general framework for multi-sensor object recognition through a discriminative probabilistic approach modelling spatial and temporal correlations.The algorithm is developed in the context of Conditional Random Fields (CRFs) trained with virtual evidence boosting.The resulting system is able to integrate arbitrary sensorinformation and incorporate features extracted from the data.The spatial relationships captured by are further integratedinto a smoothing algorithm to improve recognition over time.We demonstrate the benefits of modelling spatial and temporal relationships for the problem of detecting cars using laser and vision data in outdoor environments.
Friday, November 23, 2007
We propose a novel system for tracking multiple
pedestrians in a crowded scene by exploiting single-row laser
range scanners that measure distances of surrounding objects.
A walking model is built to describe the periodicity of the
movement of the feet in the spatial-temporal domain, and a
mean-shift clustering technique in combination with spatialtemporal
correlation analysis is applied to detect pedestrians.
Based on the walking model, particle filter is employed to track
multiple pedestrians. Compared with camera-based methods,
our system provides a novel technique to track multiple pedestrians
in a relatively large area. The experiments, in which over
300 pedestrians were tracked in 5 minutes, show the validity
of the proposed system.
DIST – University of Genova, Italy
The paper focuses on the localization subsystem
of ANSER, a mobile robot for autonomous surveillance in
civilian airports and similar wide outdoor areas. ANSER
localization subsystem is composed of a non-differential GPS
unit and a laser rangefinder for landmark-based localization
(inertial sensors are absent). An augmented state vector
approach and an Extended Kalman filter are successfully
employed to estimate the colored components in GPS noise,
thus getting closer to the conditions for the EKF to be
Thursday, November 22, 2007
CMU RI Thesis Proposal Nov 27, 2007 Peer-Advising: An Approach for Policy Improvement when Learning by Demonstration
The presence of robots within the world is becoming ever more prevalent. Whether exploration rovers in space or recreational robots for the home, successful autonomous robot operation requires a motion control algorithm, or policy, which maps observations of the world to actions available on the robot. Policy development is generally a complex process restricted to experts within the field. However, as robots become more commonplace, the need for policy development which is straightforward and feasible for non-experts will increase. Furthermore, as robots co-exist with people, humans and robots will necessarily share experiences. With this thesis, we explore an approach to policy development which exploits information from shared human-robot experience. We introduce the concept of policy development through peer-advice: to improve its policy, the robot learner takes advice from a human peer. We characterize a peer as able to execute the robot motion task herself, and to evaluate robot performance according to the measures used to evaluate her own executions.
We develop peer-advising within a Learning by Demonstration (LbD) framework. In typical LbD systems, a teacher provides demonstration data, and the learner estimates the underlying function mapping observations to actions within this dataset. With our approach, we extend this framework to then enter an explicit policy improvement phase. We identify two basic conduits for policy improvement within this setup: to modify the demonstration dataset, and to change the approximating function directly. The former approach we refer to as data-advising, and the latter as function-advising. We have developed a preliminary algorithm which extends the LbD framework along both of these conduits.
This algorithm has been validated empirically both within simulation and using a Segway RMP robot. Peer-advice has proven effective towards control policy modification, and to improve policy performance. Within classical LbD learner performance is limited by the demonstrator’s abilities; however, through advice learner performance has been shown to extend and even exceed capabilities of the demonstration set. In our proposed work, we will further develop and explore peer-advice as an effective tool for LbD policy improvement. Our primary focus will be the development of novel techniques for both function-advising and data-advising. This proposed work will be validated on a Segway RMP robot.
University of Freiburg
ARC Centre of Excellence for Autonomous Systems
Australian Centre for Field Robotics
The University of Sydney, Australia
Abstract—This paper presents a method to perform localization in urban environments using segment-based maps together with particle filters. In the proposed approach, the likelihood function is generated as a grid, derived from segment-based maps. The scheme can efficiently assign weights to the particles in real time, with minimum memory requirements and without any additional pre-filtering procedure. Multi-hypotheses cases are handled transparently by the filter. A local history-based observation model is formulated as an extension to deal with ‘out-of-map’ navigation cases. This feature is highly desirable since the map can be incomplete, or the vehicle can be actually located outside the boundaries of the provided map. The system behaves like a ‘virtual GPS’, providing global localization in urban environments, without using an actual GPS. Experimental results show the performance of the proposed architecture in large scale urban environments using route network description (RNDF) segment-based maps.
Title: Conditional Random Fields for Labeling Tasks in Robotics
Speaker: Dieter Fox,
Over the last decade, the mobile robotics community has developed highly efficient and robust solutions to estimation problems such as robot localization and map building. With the availability of various techniques for spatially consistent sensor integration, an important next goal is the extraction of high-level information from sensor data. Such information is often discrete, requiring techniques different from those typically applied to mapping and localization.
In this talk I will describe how Conditional Random Fields (CRF) can be applied to tasks such as semantic place labeling, object recognition, and scan matching. CRFs are discriminative, undirected graphical models that were developed for labeling sequence data. Due to their ability to handle arbitrary dependencies between observation features, CRFs are extremely well suited for classification problems involving high-dimensional feature vectors.This is joint work with Bertrand Douillard, Stephen Friedman, Benson Limketkai, Lin Liao, and Fabio Ramos.
Decentralized SLAM for Pedestrians without direct Communication
Alexander Kleiner and Dali Sun
We consider the problem of Decentralized Simultaneous Localization And Mapping (DSLAM) for pedestrians in the context of Urban Search And Rescue (USAR). In this context, DSLAM is a challenging task. First, data exchange fails due to cut off communication links. Second, loop-closure is cumbersome due to the fact that fireman will intentionally try to avoid performing loops, when facing the reality of emergency response, e.g. while they are searching for victims.
In this paper, we introduce a solution to this problem based on the non-selfish sharing of information between pedestrians for loop-closure. We introduce a novel DSLAM method which is based on data exchange and association via RFID technology, not requiring any radio communication. The approach has been evaluated within both outdoor and semi-indoor environments. The presented results show that sharing information between single pedestrians allows to optimize globally their individual paths, even if they are not able to communicate directly.
Wednesday, November 21, 2007
Abstract—In the past many solutions for simultaneous localization and mapping (SLAM) have been presented. Recently these solutions have been extended to map large environments
with six degrees of freedom (DoF) poses. To demonstrate the capabilities of these SLAM algorithms it is common practice to present the generated maps and successful loop closing.
Unfortunately there is often no objective performance metric that allows to compare different approaches. This fact is attributed to the lack of ground truth data. For this reason we present a novel method that is able to generate this ground truth data based on reference maps. Further on, the resulting reference path is used to measure the absolute performance of different 6D SLAM algorithms building a large urban outdoor map.
(not available online jet -> lab server /2007.IROS/data/papers/0154.pdf)
Tuesday, November 20, 2007
Lab Meeting November 20, 2007 (YuChun): Design of a Social Mobile Robot Using Emotion-Based Decision Mechanisms
Geoffrey A. Hollinger, Yavor Georgiev, Anthony Manfredi, Bruce A. Maxwell, Zachary A. Pezzementi, and Benjamin Mitchell
IEEE/RSJ Int'l Conf. on Intelligent Robots and Systems
In this paper, we describe a robot that interacts with humans in a crowded conference environment. The robot detects faces, determines the shirt color of onlooking conference attendants, and reacts with a combination of speech, musical, and movement responses. It continuously updates an internal emotional state, modeled realistically after human psychology research. Using empirically-determined mapping functions, the robot’s state in the emotion space is translated to a particular set of sound and movement responses. We successfully demonstrate this system at the AAAI ’05 Open Interaction Event, showing the potential for emotional modeling to improve human-robot interaction.
Monday, November 19, 2007
Sunday, November 18, 2007
Author: Brendan J. Frey and Delbert Dueck
Clustering data by identifying a subset of representative examples is important for processing sensory signals and detecting patterns in data. Such “exemplars” can be found by randomly choosing an initial subset of data points and then iteratively refining it, but this works well only if that initial choice is close to a good solution. We devised a method called “affinity propagation,” which takes as input measures of similarity between pairs of data points. Real-valued messages are exchanged between data points until a high-quality set of exemplars and corresponding clusters gradually emerges. We used affinity propagation to cluster images of faces, detect genes in microarray data, identify representative sentences in this manuscript, and identify cities that are efficiently accessed by airline travel. Affinity propagation found clusters with much lower error than other methods, and it did so in less than one-hundredth the amount of time.
Saturday, November 17, 2007
* 17 November 2007
* Zeeya Merali
* Magazine issue 2630
GARRETT LISI is an unlikely individual to be staking a claim for a theory of everything. He has no university affiliation and spends most of the year surfing in Hawaii. In winter, he heads to the mountains near Lake Tahoe, California, to teach snowboarding. Until recently, physics was not much more than a hobby.
That hasn't stopped some leading physicists sitting up and taking notice after Lisi made his theory public on the physics pre-print archive this week (www.arxiv.org/abs/0711.0770). By analysing the most elegant and intricate pattern known to mathematics, Lisi has uncovered a relationship underlying all the universe's particles and forces, including gravity -
See the full article.
Thursday, November 15, 2007
Title: The Maximum Entropy Principle
Date: Monday November 19
The maximum entropy principle (maxent) has been applied to solve density estimation problems in physics (since 1871), statistics and information theory (since 1957), as well as machine learning (since 1993). According to this principle, we should represent available information as constraints and among all the distributions satisfying the constraints choose the one of maximum entropy. In this overview I will contrast various motivations of maxent with the main focus on applications in statistical inference. I will discuss the equivalence between robust Bayes, maximum entropy, and regularized maximum likelihood estimation, and the implications for principled statistical inference. Finally, I will describe how maxent has been applied to model natural languages and geographic distributions of species.
Danh Trinh, 35, of Towson, Md., won iRobot's Create Challenge contest and its $5,000 prize, with his Personal Home Robot, the company announced Tuesday.
iRobot Create is a preassembled programmable robot designed so developers can create new robots without having to build everything from scratch.
See the full article.
Jeff, make our PAL4 robot smarter and more powerful!
Monday, November 12, 2007
Lab Meeting November 13, 2007 (Leo) : Tracking Multiple Targets With Correlated Measurements and Maneuvers
Author: Rogers, S.R.
The problem of tracking N targets with correlation in both
measurement and maneuver statistics is solved by transforming to a
coordinate frame in which the N targets are decoupled. For the case
of N identical targets, the decoupling is shown to coincide with a
transformation to a set of nested center-of-mass coordinates.
Absolute and differential tracking accuracies are compared with
suboptimal results to show the improvement that is achieved by
properly exploiting the correlation between targets.
Lab Meeting 13 November (Any): An Efficient FastSLAM Algorithm for Generating Maps of Large-Scale Cyclic Environments from Raw Laser Range Measurement
Intl. Conference on Intelligent Robots and Systems
Author: Joerg Liebelt@cmu, Jing Xiao@Epson, Jie Yang@cmu
Active Appearance Models (AAMs) have been popularly used to represent the appearance and shape variations of human faces. Fitting an AAM to images recovers the face pose as well as its deformable shape and varying appearance. Successful fitting requires that the AAM is sufficiently generic such that it covers all possible facial appearances and shapes in the images. Such a generic AAM is often difficult to be obtained in practice, especially when the image quality is low or when occlusion occurs. To achieve robust AAM fitting under such circumstances, this paper proposes to incorporate the disparity data obtained from a stereo camera with the image fitting process. We develop an iterative multi-level algorithm that combines efficient AAM fitting to 2D images and robust 3D shape alignment to disparity data. Experiments on tracking faces in low-resolution images captured from meeting scenarios show that the proposed method achieves better performance than the original 2D AAM fitting algorithm. We also demonstrate an application of the proposed method to a facial expression recognition task.
Sunday, November 11, 2007
* NewScientist.com news service
* Mason Inman
Computers might not be clever enough to trick adults into thinking they are intelligent yet, but a new study shows that a giggling robot is sophisticated enough to get toddlers to treat it as a peer.
An experiment led by Javier Movellan at the University of California San Diego, US, is the first long-term study of interaction between toddlers and robots.
QRIO stayed in the middle of a classroom of a dozen toddlers aged between 18 months and two years, using its sensors to avoid bumping the kids or the walls. It was initially programmed to giggle when the kids touched its head, to occasionally sit down, and to lie down when its batteries died. A human operator could also make the robot turn its gaze towards a child or wave as they went away. "We expected that after a few hours, the magic was going to fade," Movellan says. "That's what has been found with earlier robots." But, in fact, the kids warmed to the robot over several weeks, eventually interacting with QRIO in much the same way they did with other toddlers. These interactions increased in quality over several months.
Eventually, the children seemed to care about the robot's well being. They helped it up when it fell, and played "care-taking" games with it. When the researchers programmed QRIO to spend all its time dancing, the kids quickly lost interest. When the robot went back to its old self, the kids again treated it like a peer again.
Movellan says that a robot like this might eventually be useful as a classroom assistant. "You can think of it as an appliance," he says. "We need to find the things that the robots are better at, and leave to humans the things humans are better at," Movellan says.
Lab Meeting November 13th, 2007 A high integrity IMU/GPS navigation loop for autonomous landvehicle applications
This paper describes the development and implementation of a high integrity navigation system, based on the combined use of the Global Positioning System (GPS) and an inertial measurement unit (IMU), for autonomous land vehicle applications. The paper focuses on the issue of achieving the integrity required of the navigation loop for use in autonomous systems. The paper highlights the detection of possible faults both before and during the fusion process in order to enhance the integrity of the navigation loop. The implementation of this fault detection methodology considers both low frequency faults in the IMU caused by bias in the sensor readings and the misalignment of the unit, and high frequency faults from the GPS receiver caused by multipath errors. The implementation, based on a low-cost, strapdown IMU, aided by either standard or carrier phase GPS technologies, is described. Results of the fusion process are presented
Friday, November 09, 2007
Title: Spatiotemporal Stochastic Processes and Their Prediction
Venue: NSH 1507
Date: Monday November 12
Time: 12:00 noon
This talk will continue the over-view of stochastic processes, movingfrom those which just evolve in time to ones which evolve in time andspace, where "space" can be a regular lattice, Euclidean space, a graph, etc. Adding space creates lots of interesting possibilities, which I'llillustrate with "cellular automata" models of physical and biologicalself-organization. After the challenges this setting raises for statistical learning have had a chance to sink in, I'll describe anapproach to discovering efficient "local predictors", and using them toautomatically identify interesting coherent structures in spatio-temporal data.
In devising methods for optimization problems associated with learning tasks, and in studying the runtime of these methods, we usually think of the runtime as increasing with the data set size. However, from a learning performance perspective, having more data available should not mean we need to spend more time optimizing. At the extreme, we can always ignore some of the data if it makes optimization difficult. But perhaps having more data available can actually allow us to spend less time optimizing?
Two types of behaviors:
(1) a phase transition behavior, where a computationally intractable problems becomes tractable, at the cost of excess information. I will demonstrate this through a detailed study of informational and computational limits in clustering.
(2) the scaling of the computational cost of training, e.g. supportvector machines (SVMs). I will argue that the computational cost should scale down with data set size, and up with the "hardness" ofthe decision problem. In particular, I will describe a simple training procedure, achieving state-of-the-art performance on large data sets, whose runtime does not increase with data set size.
Speaker: Jiang Ni
Face view synthesis involves using one view of a face to artificially render another view. It is an interesting problem in computer vision and computer graphics, and can be applied in the entertainment industry for animated movies and video games. The fact that the input is only a single image, makes the problem very difficult. Previous approaches learn a linear model on pair of poses from 2D training data and then predict the unknown pose in the test example. Such 2D approaches are much more practical than approaches requiring 3D data and more computationally efficient. However they perform inadequately when dealing with large angles between poses. In this thesis, we seek to improve performance through better choices in probabilistic modeling. As a first step, we have implemented a statistical model combining distance in feature space (DIFS) and distance from feature space (DFFS) for such pair of poses. Such a representation leads to better performance. As a second step, we model the relationship between the poses using a Bayesian network. This representation takes advantage of the sparse statistical structure of faces. In particular, we have observed that a given pixel is often statistically correlated with only a small number of other pixel variables. The Bayesian network provides a concise representation for this behavior reducing the susceptibility to over-fitting. Compared with the linear method, the Bayesian network more accurately predicts small and localized features.
Here is the link.
Tuesday, November 06, 2007
The localization problem for an autonomous robot moving in a known environment is a well-studied problem which has seen many elegant solutions. Robot localization in a dynamic environment populated by several moving obstacles, however, is still a challenge for research. In this paper, we use an omnidirectional camera mounted on a mobile robot to perform a sort of scan matching. The omnidirectional vision system finds the distances of the closest color transitions in the environment, mimicking the way laser rangefinders detect the closest obstacles. The similarity of our sensor with classical rangefinders allows the use of practically unmodified Monte Carlo algorithms, with the additional advantage of being able to easily detect occlusions caused by moving obstacles. The proposed system was initially implemented in the RoboCup Middle-Size domain, but the experiments we present in this paper prove it to be valid in a general indoor environment with natural color transitions. We present localization experiments both in the RoboCup environment and in an unmodified office environment. In addition, we assessed the robustness of the system to sensor occlusions caused by other moving robots. The localization system runs in real-time on low-cost hardware.
Monday, November 05, 2007
Author : Lin Liao, Donald J. Patterson, Dieter Fox, Henry Kautz
This paper introduces a hierarchical Markov model that can learn and infer auser’s daily movements through an urban community. The model uses multiple levelsof abstraction in order to bridge the gap between raw GPS sensor measurementsand high level information such as a user’s destination and mode of transportation.To achieve efficient inference, we apply Rao-Blackwellized particle filters at multiplelevels of the model hierarchy. Locations such as bus stops and parking lots, wherethe user frequently changes mode of transportation, are learned from GPS datalogs without manual labeling of training data. We experimentally demonstrate howto accurately detect novel behavior or user errors (e.g. taking a wrong bus) byexplicitly modeling activities in the context of the user’s historical data. Finally, wediscuss an application called “Opportunity Knocks” that employs our techniques tohelp cognitively-impaired people use public transportation safely.
Thursday, November 01, 2007
Title: Stochastic Processes and Their Prediction
Venue: NSH 1507
Date: Monday November 5
Time: 12:00 noon
Stochastic processes are collections of interdependent random variables; this talk will be an overview of some of the main concepts, and ways in which they might interest people in machine learning. After a brief mathematical introduction, I focus on stochastic processes whose variables are indexed by time, which are closely related to dynamical systems. The key problem here is understanding the dependence of the variables across time, and the different sorts of long-run behavior to which it can give rise. I will talk about various kinds of dependence structure, especially Markov dependence; how to give Markovian representations of non-Markovian processes; and how to use these Markovian representations for prediction. Finally, I'll close with some recent work on discovering predictive Markovian representations from time series.
Wednesday, October 31, 2007
Department of Psychology
Carnegie Mellon University
Mauldin Auditorium (NSH 1305 )
Talk 3:30 pm
Monday, October 29, 2007
Author：Baker, S. Nayar, S.K.
From：Computer Vision, 1998. Sixth International Conference
Conventional video cameras have limited fields of view which make them restrictive for certain applications in computational vision. A catadioptric sensor uses a combination of lenses and mirrors placed in a carefully arranged configuration to capture a much wider field of view. When designing a catadioptric sensor, the shape of the mirror(s) should ideally be selected to ensure that the complete catadioptric system has a single effective viewpoint. In this paper, we derive the complete class of single-lens single-mirror catadioptric sensors which have a single viewpoint and an expression for the spatial resolution of a catadioptric sensor in terms of the resolution of the camera used to construct it. We also include a preliminary analysis of the defocus blur caused by the use of a curved mirror.
Saturday, October 27, 2007
Unsupervised categorization of images or image parts is
often needed for image and video summarization or as a
preprocessing step in supervised methods for classification,
tracking and segmentation. While many metric-based techniques
have been applied to this problem in the vision community,
often, the most natural measures of similarity (e.g.,
number of matching SIFT features) between pairs of images
or image parts is non-metric. Unsupervised categorization
by identifying a subset of representative exemplars can
be efficiently performed with the recently-proposed ‘affinity
propagation’ algorithm. In contrast to k-centers clustering,
which iteratively refines an initial randomly-chosen
set of exemplars, affinity propagation simultaneously considers
all data points as potential exemplars and iteratively
exchanges messages between data points until a good solution
emerges. When applied to the Olivetti face data set
using a translation-invariant non-metric similarity, affinity
propagation achieves a much lower reconstruction error
and nearly halves the classification error rate, compared
to state-of-the-art techniques. For the more challenging
problem of unsupervised categorization of images from the
Caltech101 data set, we derived non-metric similarities between
pairs of images by matching SIFT features. Affinity
propagation successfully identifies meaningful categories,
which provide a natural summarization of the training images
and can be used to classify new input images.
Thursday, October 25, 2007
Title: Proximity on Graphs: Definitions, Fast Solutions and Applications
Venue: NSH 1507
Date: Monday October 29
Time: 12:00 noon
Graphs appear in a wide range of settings, like computer networks, the
world wide web, biological networks, social networks
(MSN/FaceBook/LinkedIn) and many more. How to find master-mind criminal
given some suspects X, Y and Z? How to find user-specific pattern (like,
e.g. a money-laundering ring)? How to track the most influential authors
over time? How to automatically associate digital images with proper
keywords? How to answer all these questions quickly on large (disk
resident) graphs? It turns out that the main tool behind these
applications (and many more) is the proximity measurement: given two
nodes A and B in a network, how close is the target node B related
to the source A?
In this talk, I will cover three aspects of the proximity on graphs: (1)
Proximity definitions. I will start with random walk with restart, the
main idea behind Google's PageRank algorithm, and talk about its
variants and generalizations. (2) Computational issue. Many proximities
measurements involve a specific linear system. I will give algorithms on
how to efficiently solve such linear system(s) in several different
settings. (3) Applications. I will show some applications of the
proximity, including link prediction, neighborhood formulation, image
caption, center-piece subgraph, pattern match etc.
Lab meeting 22 Oct ober (韋麒) : Design and Control of Five-Fingered Haptic Interface Opposite to Human Hand
This paper presents the design and control of a newly developed five-fingered haptic interface robot named HIRO II
Full Text: PDF (962 KB)
Tuesday, October 23, 2007
Wednesday, October 24 Noon
Speaker (1) Gil Jones Ph.D. Candidate Robotics Institute
Title Learning-enhanced Market-based Task Allocation for Oversubscribed Domains
Abstract: This paper presents a learning-enhanced market-based task allocation approach for oversubscribed domains. In oversubscribed domains all tasks cannot be completed within the required deadlines due to a lack of resources. We focus specifically on domains where tasks can be generated throughout the mission, tasks can have different levels of importance and urgency, and penalties areassessed for failed commitments. Therefore, agents must reason aboutpotentialfuture events before making task commitments. Within these constraints, existing market-based approaches to task allocation can handle task importanceand urgency, but do a poor job of anticipating future tasks, and are hence assessed a high number of penalties.
Speaker (2) Balajee Kannan Research Enginee Robotics Institute
Title Metrics for quantifying system performance in intelligent, fault-tolerant multi-robot teams
Abstract: The quality of the incorporated fault-tolerance has a direct impact on the overall performance of the system. Hence, being able to measure the extent and usefulness of fault-tolerance exhibited by the system would provide the designer with a useful analysis tool for better understanding the system as a whole. Unfortunately, it is difficult to quantify system fault-tolerance on its own for intelligent systems. A more useful metric for evaluation is the "effectiveness" measure offault-tolerance, i.e.,the influence of fault-tolerance towards improving overall performance determines the overall effectiveness or quality of thesystem.In this paper, we outline application-independent metrics to measure fault-tolerance within the context of system performance. In addition, we also outline potential methods to better interpret the obtained measures towards understanding the capabilities of the implemented system. Furthermore, a main focus of our approach is to capture the effect of intelligence, reasoning, or learning on the effective fault-tolerance of thesystem, rather than relying purely on traditional redundancy based measures.
HRI08 Workshop: Coding Behavioral Video Data and Reasoning Data in Human-Robot Interaction: Call for submission
* Participants will understand different approaches towardconstructing coding systems.
* Participants will be positioned better to analyze their own HRIdata.
* We will have begun to establish a community of researchers anddesigners who can share related ideas with one another in the years to come.
* We will move forward with publishing proceedings from the workshop.Deadline for Submission (for Presenters): December 14,2007
Peter H. Kahn, Jr.
University of Washington, USA
Advanced Telecommunications Research (ATR), Japan
Nathan G. Freier
Rensselaer Polytechnic Institute, USA
Rachel L. Severson
University of Washington, USA
Advanced Telecommunications Research (ATR) and Osaka University, Japan
As the field of human-robot interaction begins to mature, researchers anddesigners are recognizing the need for systematic, comprehensive, andtheoretically-grounded methodologies for investigating people's socialinteractions with robots. One attractive approach entails the collection ofbehavioral video data in naturalistic or experimental settings. Anotherattractive approach entails interviewing participants about theirconceptions of human-robot interaction (e.g., during or immediatelyfollowing an interaction with a specific robot). With behavioral video dataand/or reasoning data in hand, the question then emerges: How does one codeand analyze such data?
The workshop is divided into two main parts.
Morning. Our collaborative laboratories (from the University of Washingtonand ATR) will share in some depth the coding system we have developed forcoding 90 children's social and moral behavior with and reasoning about ahumanoid robot (ATR's Robovie). This coding manual builds from othersystems we have developed and disseminated elsewhere as technical reports(Friedman, et al, 2005; Kahn et al., 2003, 2005, 2005). Key issuespresented in the morning include:
* What is a Coding Manual?
* Getting Started - Iterating between Data and Theory
* Building on Previous Systems, when Applicable
* Hierarchical Organization of Categories
* Time Segmentation of Behavior
* Behavior in Response to Robot-Initiated and Experimenter-InitiatedStimulus
* Coding Social and Moral Reasoning
* How to Deal with Multiple Ways of Coding a Single Behavioral Eventor Reason
* Reliability CodingWe'll have plenty of time for discussion of issues as they emerge.
Afternoon: Following a group lunch, we'll then have up to 5 participantspresent for 20 minutes each (followed by 20 minutes of discussion after eachpresentation). Presenters will provide a brief overview of one of their HRIresearch projects (hopefully with some video data or interview data inhand), and then explicate three problems they encountered in coding thedata, and then (if at all) how they sought to solve the problems. The 20minute discussion periods will provide time for participants to discuss thenature of the problems and other possible solution strategies.
Two Types of Participation
There will be two types of participation:
5 Presenters (in addition to the 5 organizers): Presenters will be activelyinvolved in HRI research that involves behavioral and/or reasoning data. Asnoted above, each presenter will have 20 minutes to present an overview ofone of their HRI research projects, and to present three problemsencountered and possible solutions.
Other Workshop Participants: Participants will join in the workshop andparticipate in discussions. The prerequisite is simply an interest in thetopic.
As noted above, there will be two types of participation: (1) workshoppresenters, and (2) workshop participants. Submission guidelines differdepending on your interests in participating:
(1) Workshop Presenter: Send a one-page single-spaced summary of your HRIresearch project, and three possible coding problems encountered andpossible solutions. Indicate whether you anticipate having some actual datato share (video clips or interview transcripts) that illustrate your issuesat hand. Include an additional paragraph that summarizes your background inHRI. These submissions will be peer-reviewed. The deadline for submissionis December 14, 2007.
(2) Workshop Participant: Send a one-paragraph summary of your backgroundin HRI and interest in the workshop. Participants will be accepted on afirst-come-first-admitted basis.
The workshop will take place March 12, 2008, at the HRI '08 conference site,the beautiful Felix Meritis cultural center in central Amsterdam.
We plan to publish proceedings of the workshop in the form of a technicalreport. At this junction, the technical report will include the full codingsystem for the UW-ATR study on Children's Social and Moral Relationshipswith a Humanoid Robot. We would also like to include full coding systemsfrom the other 5 presenters in the workshop. Together, then, we would havecreated a vibrant initial repository of coding systems for other researchersto draw upon. However, if not all of the presenters have full systems, thenwe will include a written version of their summary of their project andtheir 3 problems and solutions presented during the workshop.
Sunday, October 21, 2007
IEEE Conference on Computer Vision and Pattern Recognition 2007 (CVPR'07)
In this paper, we present a system that integrates fully automatic scene geometry estimation, 2D object detection,3D localization, trajectory estimation, and tracking for dynamic scene interpretation from a moving vehicle. Our sole input are two video streams from a calibrated stereo rig on top of a car. From these streams, we estimate Structurefrom-Motion (SfM) and scene geometry in real-time. In parallel,we perform multi-view/multi-category object recognition to detect cars and pedestrians in both camera images. Using the SfM self-localization, 2D object detections are converted to 3D observations, which are accumulated in a world coordinate frame. A subsequent tracking module analyzes the resulting 3D observations to find physically plausible spacetime trajectories. Finally, a global optimization criterion takes object-object interactions into account to arriveat accurate 3D localization and trajectory estimatesfor both cars and pedestrians. We demonstrate the performance of our integrated system on challenging real-world data showing car passages through crowded city areas.
Full article: link
Seam Carving for Content-Based Image Retargeting
SIGGRAPH, San-Diego, 2007
Video: Image Resizing by Seam Carving
News: New Tricks for Online Photo Editing
Online photo editor Rsizr implements the feature
Friday, October 19, 2007
Author: Emrah Akin Sisbot, Luis F. Marin-Urias, Rachid Alami, and Thierry Sim´eon, Member, IEEE
Robot navigation in the presence of humans raises new issues formotion planning and control when the humans must be taken explicitly into account. We claim that a human aware motion planner (HAMP) must not only provide safe robot paths, but also synthesize good, socially acceptable and legible paths. This paper focuses on amotion planner that takes explicitly into account its human partners by reasoning about their accessibility, their vision field and their preferences in terms of relative human–robot placement and motions in realistic environments. This planner is part of a human-aware motion and manipulation planning and control system that we aim to develop in order to achieve motion and manipulation tasks in the presence or in synergy with humans.
Thursday, October 18, 2007
The developed haptic interface can present force and tactile feeling to the five fingertips of the human hand. Its mechanism consists of a 6 degree of freedom (DOF) arm and a 15 DOF hand.
The design performance index consists of the product space between the operator’s finger and the hapic finger, and the opposability of the thumb and fingers. Moreover, in order to reduce the feeling of uneasiness in the operator, a mixed control method consisting of a finger-force control and an arm position control intended to maximize the control performance index, which consists of the hand manipulability measure and the norm of the arm-joint angle vector is proposed.
Optimal Multi-Agent Scheduling with Constraint Programming
We consider the problem of computing optimal schedules in multi-agent systems. In these problems, actions of one agent can influence the actions of other agents, while the objective is to maximize the total `quality' of the schedule.
We show how we can model and efficiently solve these problems with constraint programming technology. Elements of our proposed method include constraint-based reasoning, search strategies, problem decomposition, scheduling algorithms, and a linear programming relaxation.
Tuesday, October 16, 2007
Eee PC Spec
Monday, October 15, 2007
Lab Meeting 15 October (Der-Yeuan): Introduction to Robotics Programming with Microsoft Robotics Studio
Microsoft Robotics Studio (MSRS) is a Windows-based IDE for robotics programming. Its primary components are the Concurrency and Coordination Runtime (CCR) and the Decentralized System Services (DSS). The CCR emphasizes in scheduling the tasks to manage concurrency and load-balancing for different applications. The DSS is a service-oriented approach to robot component integration where every software or hardware component of a design is a service. Such web-based architecture allows services within a network to interact. Given the experience of MSRS with LEGO NXT bricks, this presentation will provide a brief introduction to CCR and DSS, and give some insight on the maturity of MSRS.
Sunday, October 14, 2007
I will show the result of acoustic source localization using the SFS and discuss traits of uncertainty.
Proficient teams can accomplish goals that would not otherwise be achievable by groups of uncoordinated individuals. This thesis addresses the problem of analyzing team activities from external observations and prior knowledge of the team's behavior patterns. There are three general classes of recognition cues that are potentially valuable for team activity/plan recognition: (1) spatial relationships between team members and/or physical landmarks that stay fixed over a period of time; (2) temporal dependencies between behaviors in a plan or between actions in a behavior; (3) coordination constraints between agents and the actions that they are performing. This thesis examines how to leverage available spatial, temporal, and coordination cues to perform offline multi-agent activity/plan recognition for teams with dynamic membership.
In physical domains (military, athletic, or robotic), team behaviors often have an observable spatio-temporal structure, defined by the relative physical positions of team members and their relation to static landmarks; we suggest that this structure, along with temporal dependencies and coordination constraints defined by a team plan library, can be exploited to perform behavior recognition on traces of agent activity over time, even in the presence of uninvolved agents. Unlike prior work in team plan recognition where it is assumed that team membership stays constant over time, this thesis addresses the novel problem of recovering agent-to-team assignment for team tasks where team composition, the mapping of agents into teams, changes over time; this allows the analysis of more complicated tasks in which agents must periodically divide into subteams.
This thesis makes four main contributions: (1) an efficient and robust technique for formation identification based on spatial relationships; (2) a new algorithm for simultaneously determining team membership and performing behavior recognition on spatio-temporal traces with dynamic team membership; (3) a general pruning technique based on coordination cues that improves the efficiency of plan recognition for dynamic teams; (4) methods for identifying player policies in team games that lack strong spatial, temporal, and coordination dependencies.
Speaker: Stephen Stancliff
Current mobile robots generally fall into one of two categories as far as reliability is concerned - highly unreliable, or very expensive. Most fall into the first category, requiring teams of graduate students or staff engineers to coddle them in the days and hours before a brief demonstration. The few robots that exhibit very high reliability, such as those used by NASA for planetary exploration, are very expensive.
In order for mobile robots to become more widely used in real-world environments, they will need to have reliability in between these two extremes. In order to design mobile robots with respect to reliability, we need quantitative models for predicting robot reliability and for relating reliability to other design parameters. To date, however, there has been very little formal discussion of reliability in the mobile robotics literature, and no general method has been presented for quantitatively predicting the reliability of mobile robots.
This thesis proposal focuses on this problem of predicting reliability for mobile robots and for using reliability as a quantitative input into mobile robot mission design.
Proposal Link: http://www.cs.cmu.edu/~stancliff/proposal/proposal.pdf
authors: Chieh-Chih Wang, Kao-Wei Wan and Tzu-Chien Lo
Interactions between targets have been exploited
to solve the occlusion problem in multitarget tracking but
not to provide higher level scene understanding. In our previous
work , a variable structure multiple model estimation
framework with a scene interaction model and a neighboring
object interaction model was proposed to accomplish these
two tasks. The proposed approach was demonstrated in urban
areas using a laser scanner. As indoor environments
are relatively unconstrained than urban areas, interactions in
indoor environments are weaker and have more variants. Weak
interactions make scene interaction modeling and neighboring
object interaction modeling challenging. In this paper, a
place-driven scene interaction model is proposed to represent
long-term interactions in indoor environments. To deal with
complicated short-term interactions, the neighboring object
interaction model is consisted of three short-term interaction
models, following, approaching and avoidance. The moving
model, the stationary process model and these two interaction
models are integrated to accomplish weakly interacting object
tracking. In addition, higher level scene understanding such as
unusual activity recognition and important place identification
is accomplished straightforwardly. The experimental results
using data from a laser scanner demonstrate the feasibility
and robustness of the proposed approaches.
Friday, October 12, 2007
Speaker: Prof. Fei-Fei Li
Date: October 10, 2007
For both humans and machines, the ability to learn and recognize the semantically meaningful contents of the visual world is an essential andimportant functionality. In this talk, we will examine the topic ofnatural scene categorization and recognition in human psychophysical and physiological experiments as well as in computer vision modeling.
I will first present a series of recent human psychophysics studies onnatural scene recognition. All these experiments converge to oneprominent phenomena of the human visual system: humans are extremely efficient and rapid in capturing the semantic contents of the real-worldimages. Inspired by these behavioral results, we report a recent fMRIexperiment that classifies different types of natural scenes (e.g. beach vs. building vs. forest, etc.) based on the distributed fMRI activity.This is achieved by utilizing a number of pattern recognition algorithmsin order to capture the multivariate nature of the complex fMRI data.
In the second half of the talk, we present a generative Bayesianhierarchical model that learns to categorize natural images in a weaklysupervised fashion. We represent an image by a collection of localregions, denoted as codewords obtained by unsupervised clustering. Eachregion is then represented as part of a `theme'. In previous work, suchthemes were learnt from hand-annotations of experts, while our method learns the theme distribution as well as the codewords distribution overthe themes without such supervision. We report excellent categorizationperformances on a large set of 13 categories of complex scenes.
Bio:Prof. Fei-Fei Li's main research interest is in vision, particularlyhigh-level visual recognition.
In computer vision, Fei-Fei's interestsspan from object and natural scene categorization to human activity categorizations in both videos and still images. In human vision, shehas studied the interaction of attention and natural scene and objectrecognition. In a recent project, she also studies the human brain fMRI activities in natural scene categorization by using pattern recognitionalgorithms. Fei-Fei graduated from Princeton University in 1999 with aphysics degree, and a minor in engineering physics. She received her PhD in electrical engineering from the California Institute of Technology in2005. Fei-Fei was on faculty in the Electrical and Computer EngineeringDept. at the University of Illinois Urbana-Champaign (UIUC) from Sept 2005 to Dec 2006. Starting Jan 2007, Fei-Fei is an Assistant Professorin the Computer Science Department at Princeton University. She alsoholds courtesy appointments in the Psychology Department and theNeuroscience Program at Princeton. She is a recipient of the 2006 Microsoft Research New Faculty Fellowship. (Fei-Fei publishes under thename L. Fei-Fei.)
Wednesday, October 10, 2007
Speaker: Marcin Marszalek, INRIA
We will present our recent work from CVPR'07 and describe our winning
method for Pascal VOC Challenge 2007. The talk will therefore consist of
three parts. First, we will introduce shape masks and describe how they
are used for accurate object localization (CVPR'07 oral). Second, we will
show how we learn representations for image classification using a genetic
algorithm (Pascal VOC'07 winning method). Finally, we will discuss the use
of semantic hierarchies for visual object recognition, which is the
current main focus of our research.
Accurate Object Localization with Shape Masks
Semantic Hierarchies for Visual Object Recognition
Saturday, October 06, 2007
Author: S.V. N. Vishwanathan, Nicol N. Schraudolph, Mark W. Schmidt, Kevin P. Murphy
We apply Stochastic Meta-Descent (SMD), a stochastic gradient optimization method with gain vector adaptation, to the training of Conditional Random Fields (CRFs). On several large data sets, the resulting optimizer converges to the same quality of solution over an order of magnitude faster than limited-memory BFGS, the leading method reported to date. We report results for both exact and inexact inference techniques.
Title: Unsupervised Learning of Categories Appearing in Images
Speaker: Sinisa Todorovic, UIUC
This talk is about solving the following problem: given a set of images containing frequent occurrences of multiple object categories, learn a compact, multi-category representation that encodes the models of these categories and their inter-category relationships, for the purposes of object recognition and segmentation. The categories are not defined by the user, and whether and where any instances of the categories appear in a specific image is not known. This problem is challenging as it involves the following unanswered questions. What is an object category? To which
extent human supervision is necessary to communicate the nature of object categories to a computer vision system? What is an efficient, compact representation of multiple categories, and which inter-category relationships should it capture? I will present an approach that addresses the above stated problem, wherein a category is defined as a set of 2D objects (i.e., subimages) sharing similar appearance and topological properties of their constituent regions. The approach derives from and
closely follows this definition by representing each image as a segmentation tree, whose structure captures recursive embedding of image regions in a multiscale segmentation, and whose nodes contain the associated geometric and photometric region properties. Since the presence of any categories in the image set is reflected in the occurrence of similar subtrees (i.e., 2D objects) within the image trees, the approach: (1) matches the image trees to find these similar subtrees; (2) discovers
categories by clustering similar subtrees, and uses the properties of each cluster to learn the model of the associated category; and (3) captures sharing of simpler categories among complex ones, i.e., category-subcategory relationships. The approach can also be used for addressing a less-general, subsumed problem, that of unsupervised extraction of texture elements ( i.e., texels) from a given image of 2.1D
texture, because 2.1D texture can be viewed as composed of repetitive instances of a category (e.g., waterlilies on the water surface).
Thursday, October 04, 2007
International Journal of Robotics Research 2004 (IJRR'04)
Full Article - Link.
Video - Link.
Title: High-throughput reconstruction of brain circuits: how machine vision will revolutionize neuroscience
Speaker: Dmitri B. Chklovskii
How does electrical activity in neuronal circuits give rise to intelligent behavior? We believe that this question is impossible to answer without a comprehensive description of neurons and synaptic connections between them. Absence of such a description, often called a wiring diagram, has been holding back the development of neuroscience. We believe that recent technological advances in high-resolution imaging and machine vision will make possible the reconstruction of whole wiring diagrams of simpler organisms or significant parts of more complex systems, such as the mammalian neocortex. Such reconstructions promise to revolutionize neuroscience just like human genome sequencing revolutionized molecular biology.