Sunday, September 30, 2007
Author: Brendan J. Frey and Delbert Dueck
Clustering data by identifying a subset of representative examples is important for processing sensory signals and detecting patterns in data. Such “exemplars” can be found by randomly choosing an initial subset of data points and then iteratively refining it, but this works well only if that initial choice is close to a good solution. We devised a method called “affinity propagation,” which takes as input measures of similarity between pairs of data points. Real-valued messages are exchanged between data points until a high-quality set of exemplars and corresponding clusters gradually emerges. We used affinity propagation to cluster images of faces, detect genes in microarray data, identify representative sentences in this manuscript, and identify cities that are efficiently accessed by airline travel. Affinity propagation found clusters with much lower error than other methods, and it did so in less than one-hundredth the amount of time.
Author: Dalhyung Kim and Woojin Chung
One of the key technologies of future automobiles is the parking assist or automatic parking control. Control problems of a car-like vehicle are not easy because of the nonholonomic velocity constraints. In this paper, a practical solution for planning a car-parking path is proposed according to the concept of the motion space (M-space). The M-space is the extension of the conventional C-space. A collision-free, nonholonomic feasible path can be directly computed by the M-space conversion and a back-propagation of reachable regions from the goal. The dimension of the search space can be remarkably reduced by proposed assumptions, which were derived by the unique features of parking assist system. The simulation results show that the proposed method is useful for motion planning in the parking space.
Lab Meeting 1 October (Yu-Hsiang):Extracting Places and Activities from GPS Traces Using Hierarchical Conditional Random Fields
Author : Lin Liao
Learning patterns of human behavior from sensor data is extremely important for high-levelactivity inference. We show how to extract a person’s activities and significant places from tracesof GPS data. Our system uses hierarchically structured conditional random fields to generate aconsistent model of a person’s activities and places. In contrast to existing techniques, our approachtakes high-level context into account in order to detect the significant places of a person. Our experiments show significant improvements over existing techniques. Furthermore, they indicatethat our system is able to robustly estimate a person’s activities using a model that is trained fromdata collected by other persons.
Friday, September 28, 2007
The following is a description of the integration of a commercial Inertial Measurement Unit with the navigation algorithms of the jBot autonomous off-road robot. This allows the robot to navigate unstructured outdoor environments, without a GPS or other external reference, with an accuracy on the order of about .5% of the distance traveled. That is, a journey of 1000 feet has a location error on the order of 5 feet upon arrival at the destination.
Tuesday, September 25, 2007
Associate Professor and DirectorZanvyl Krieger Mind-Brain Institute Johns Hopkins University
4:00, Monday, October 1
Mellon Institute, Third Floor Social Room
Bellefield Street entrance
Monday, September 24, 2007
Published: September 23, 2007
VANU BOSE is the son of a fabled engineer, but he garnered no mercy when he presented his big idea at a technical conference in 1996. Mr. Bose’s graduate work at M.I.T. involved using software to handle the radio function in a cellular phone. He remembers that after he successfully demonstrated his technology, an audience member stood up and dismissed it with: “Congratulations! You’ve just invented the world’s most expensive cellphone.”
Mr. Bose, a personable man, shrugged off the criticism. He expected that over time, the increasing processing speed of chips would make such phones much cheaper.
But he didn’t want to make the phones. He wanted to remake the wireless base station, the guts of the world’s cellular networks, by changing them from complex systems that incorporate hardware, software and the electronics needed for wireless communications into systems run primarily with software.
See the full article.
Friday, September 21, 2007
Title: Topics in Spam Filtering
Speaker: D. Sculley, Tufts University
This talk will examine three recent inquiries in machine-learning
based filtering of spam emails. First, we will examine a long
standing debate in the spam filtering community, and show that
online support vector machines (SVMs) do, indeed, give state of
the art performance on spam filtering tasks. Second, we show how
to reduce the cost of online SVMs with several relaxations, which
yield nearly equivalent results at greatly reduced computational
cost. Third, we investigate the use of online active learning
methods for spam filtering, which both reduce the number of labels
needed for strong filtering performance and enable a variety of
useful user-interface options. Finally, we investigate the problem
of one-sided feedback, caused when a potentially lazy user only
labels messages that appear in the inbox, and never gives feedback
on messages that are predicted to be spam.
online active learning for spam filtering
Relaxed Online SVMs for Spam Filtering
* NewScientist.com news service
* Justin Mullins
Set a swarm of robots to explore and map a large area and you will soon find that controlling them all becomes an overwhelming task. It's simply not possible to control more than handful of robots effectively using a central-command-like structure, says James McLurkin, a computer scientist at the Massachusetts Institute of Technology in Cambridge, US.
Instead, he says, you are better off allowing the robots talk to each other and, after setting a primary goal such as mapping an area or following a leader, delegating control to them. Funded by the US Navy's Space and Naval Warfare Systems Command, McLurkin has come up with just such a system.
His robots share data from their onboard optical, electromagnetic, and acoustic sensors with their swarm-mates, and frequently evaluate their ability to complete the task. McLurkin says the beauty of this design is that the number of robots involved can be dramatically increased without placing an overwhelming burden on any central-command structure.
Two videos show that the technique seems to work well in ideal environments for simple tasks such as follow the leader (mpeg format) and "clumping" into groups (mpeg format). The big question is, of course, how well these robots would cope with the real world.
Read the full swarming robots system patent application.
Wednesday, September 19, 2007
WD-2, a "face robot" developed by researchers from Waseda University in Tokyo, not only makes facial expressions, but changes its facial expressions with nearly the minute detail of a human face. In the future, personal robots may serve a multitude of purposes in work and entertainment with humans, and therefore must be able to communicate in a human-like manner. The researchers say that the mask can be modified to "copy" a human face, even displaying a person's hair style and skin color when a photo of their face is projected onto the 3D mask.
the full article and video
Author: Emma Sviestins, Noriaki Mitsunaga,Takayuki Kanda, Hiroshi Ishiguroand Norihiro Hagita
We have taken steps towards developing a method that enables an interactive humanoid robot to adapt its speed to a walking human that it is moving together with. This is difficult because the human is simultaneously adapting to the robot. From a case study in human-human walking interaction we established a hypothesis about how to read a human’s speed preference based on a relationship between humans’ walking speed and their relative position in the direction of walking. We conducted two experiments to verify this hypothesis: one with two humans walking together, and one with a human subject walking with a humanoid robot, Robovie-IV. For 11 out of 15 subjects who walked with the robot, the results were consistent with the speed-position relationship of the hypothesis. We also conducted a preferred speed estimation experiment for six of the subjects. All of them were satisfied with one or more of the speeds that our algorithm estimated and four of them answered one of the speeds as the best one if the algorithm was allowed to give three options. In the paper, we also discuss the difficulties and possibilities that we learned from this preliminary trial.
Tuesday, September 18, 2007
Senior Robotics Engineer
Robotics Engineering Center
A forward predictive model is used to simulate a vehicle's motion given a sequence of commands that could potentially be executed. Generally, forward predictive models are used by planning systems on Unmanned Ground Vehicles (UGV's) for selecting commands such that progress is made and obstacles are avoided. In this talk, I will present a data-driven approach for learning a forward predictive model based on previous vehicle experience. Results of this approach will be presented and will be compared to the conventional model that is currently used on the Crusher vehicle. In addition, performance will be analyzed from the recent Ft. Carson (August 2007) field test where a learned forward predictive model was used to traverse 100 km of off-road terrain autonomously.
Michael is currently a member of the autonomy team for the UPI project at the NREC. His research interests lie in perception, planning and large autonomous systems. He has been with the NREC since 2000 and has also worked on LAGR, Perceptor, Underground Mining Operator Assist and the Servus Retail Robot projects. Prior to coming to the NREC, Michael worked with the Robot Learning Lab as an undergraduate. He received his B.S. in Computer Science from Carnegie Mellon University in 2000 with a minor in Robotics and will be completing his M.S. in Robotics this fall.
Friday, September 14, 2007
Lab Meeting 17 September (Eric): A Sensor for Simultaneously Capturing Texture and Shape by Projecting Structured Infrared Light
Author: Akasaka, Kiyotaka Sagawa, Ryusuke Yagi, Yasushi
Abstract: Simultaneous capture of the texture and shape of a moving object in real time is expected to be applicable to various fields including virtual reality and object recognition. Two difficulties must be overcome to develop a sensor able to achieve this feature: fast capturing of shape and the simultaneous capture of texture and shape. One-shot capturing methods based on projecting colored structured lights have already been proposed to obtain shape at a high frame rate. However, since these methods used visible lights, it is impossible to capture texture and shape simultaneously. In this paper, we propose a method that uses projected infrared structured light. Since the proposed method uses visible light for texture and infrared light for shape, simultaneous capturing can be achieved. In addition, a system was developed that maps texture on to the captured shape without occlusion by placing the cameras for visible and infrared lights coaxially.
Wednesday, September 12, 2007
Music is one of the most widespread of human cultural activities, existing in some form in all cultures throughout the world. The definition of music as organised sound is widely accepted today but a naïve interpretation of this definition may suggest the notion that music exists widely in the animal kingdom, from the rasping of crickets' legs to the songs of the nightingale. However, only in the case of humans does music appear to be surplus to any obvious biological purpose, while at the same time being a strongly learned phenomenon and involving significant higher order cognitive processing rather than eliciting simple hardwired responses.
A two day workshop will take place at NIPS 07 workshops (Whistler, Canada) and will span topics from signal processing and musical structure to the cognition of music and sound. In the first day the workshop will provide a forum for cutting edge research addressing the fundamental challenges of modeling the structure of music and analysing its effect on the brain. It will also provide a venue for interaction between the machine learning and the neuroscience/brain imaging communities to discuss the broader questions related to modeling the dynamics of brain activity. During the second day the workshop will focus on the modeling of sound, music perception and cognition. These can provide, with the crucial role of machine learning, a break-through in various areas of music technology, in particular: Music Information Retrieval (MIR), expressive music synthesis, interactive music making, and sound design. Understanding of music cognition in its implied top-down processes can help to decide which of the many descriptors in MIR are crucial for the musical experience and which are irrelevant.
The organisers of the workshop are investigators for three main European projects; Learning the Structure of Music (Le StruM), Closing the loop of Evaluation and Design (CLOSED), Emergent Cognition through Active Perception (EmCAP) Mention.
Saturday, September 08, 2007
Speaker: Jianbo Shi
Our goal is to achieve large-scale object recognition, with learning, but with very few training examples. My main belief is visual intelligence occurs at multiple interconnected levels of perception, and they should be coupled tightly. I will present our recent works on integrating recognition with segmentation.
Bottom-up semantic image parsing. In many recognition tasks, one needs not only to detect an object, but also parse it into semantically meaningful parts. Borrowing concepts from NLP, we propose a bottom-up parsing of increasingly more complete partial object shapes guided by a composition tree. We demonstrate quantitative results from this challenging task on adataset of baseball players with wide pose variation. There are two key innovations of our algorithm. First, at each level of parsing, we evaluate shape as a whole, rather than the sum of its parts, unlikeprevious approaches. This allows us to model nonlinear contextual effects on parts combination. Second, the parsing hypothesis is generated by bottom-up segmentation and grouping, while verification is achieved by top-down shape matching. By forcing the hypothesis and verification steps to be mutually independent, we reduce enormous false alarms(hallucinations) often occurring in background clutter.
Image matching. Image matching is a key building block for image search, visual navigation and long range motion correspondence. Our matching algorithm combines the discriminative power of feature correspondences with the descriptive power of matching segments. We introduce the notion of co-saliency for image matching. Co-saliency matching score favors correspondences that are consistent with "soft" image segmentation as well as with local point feature matching. We express the matching algorithm via a joint image graph whose edge weights represent intra- as well as interimage relations. We have demonstrated its application in the context of visual place recognition.
I will also briefly present our results on mid-level vision, shape fromshading and contour grouping using graph formulation.
This is joint work with Praveen Srinivasan, Alexander Toshev, Qihui Zhu,and Kostas Daniilidis.
Friday, September 07, 2007
Speaker: Andrew Stein
Building on recent advances in the detection of appearance edges from multiple local cues, we present an approach for detecting occlusion boundaries which also incorporates local motion information. We argue that these boundaries have physical significance which makes them important for many high-level vision tasks and that motion offers a unique, often critical source of additional information for detecting them. We provide a new dataset of natural image sequences with labeled
occlusion boundaries, on which we learn a classifier that leverages appearance cues along with motion estimates from either side of an edge. We demonstrate improved performance for pixelwise differentiation of occlusion boundaries from non-occluding edges by combining these weak local cues, as compared to using them separately. The results are suitable as improved input to subsequent mid- or high-level reasoning methods.
Title: Background and Scale Invariant Feature Transform (BSIFT)
Current feature-based object recognition methods use information derived from local image patches. For robustness, features are engineered for invariance to various transformations, such as rotation, scaling, or affine warping. When patches overlap object boundaries, however, errors in both detection and matching will almost certainly occur due to inclusion of unwanted background pixels. This is common in real images, which often contain significant background clutter, objects which are not heavily textured, or objects which occupy a relatively small portion of the image. We suggest improvements to the popular Scale Invariant Feature Transform (SIFT) which incorporate local object boundary information. The resulting feature detection and descriptor creation processes are invariant to changes in background. We call this method the Background and Scale Invariant Feature Transform (BSIFT). We demonstrate BSIFT's superior performance in feature detection and matching on synthetic and natural images.
Speaker: Tomasz Malisiewicz
Sliding window scanning is the dominant paradigm in object recognition research today. But while much success has been reported in detecting several rectangular-shaped object classes (i.e. faces, cars, pedestrians), results have been much less impressive for more general types of objects. Several researchers have advocated the use of image segmentation as a way to get a better spatial support for objects. In this paper, our aim is to address this issue by studying the following
two questions: 1) how important is good spatial support for recognition? 2) can segmentation provide better spatial support for objects? To answer the first, we compare recognition performance using ground-truth segmentation vs. bounding boxes. To answer the second, we use the multiple segmentation approach to evaluate how close can real segments approach the ground-truth for real objects, and at what cost. Our results demonstrate the importance of finding the right spatial support
for objects, and the feasibility of doing so without excessive computational burden.
In this paper, our central goal was to carefully examine the issues involved in obtaining good spatial support for objects. With segmentation (and multiple segmentation approaches in particular) becoming popular in object recognition, we felt it was high time to do a quantitative evaluation of the benefits and the trade-offs compared to traditional sliding window methods. The results of this evaluation can be summarized in terms of the following “take-home” lessons:
Correct spatial support is important for recognition: We confirm that knowing the right spatial support leads to substantially better recognition performance for a large number of object categories, especially those that are not well approximated by a rectangle. This should give pause to researchers who feel that recognition can be solved by training Viola-Jones detectors for all the world’s objects.
Multiple segmentations are better than one: We empirically confirm the intuition
of [6, 11] that multiple segmentations (even naively produced) substantially improve spatial support estimation compared to a single segmentation. Mean-Shift is better than FH or NCuts, but together they do best: On average, Mean-Shift segmentation appeared to outperform FH and NCuts in finding good spatial support for objects. However, for some object categories, the other algorithms did a better job, suggesting that different segmentation strategies are beneficial for different object types. As a result, combining the “segment soups” from all three methods together
produced by far the best performance.
Segment merging can benefit any segmentation: Our results show that increasing
the segmentation soup by merging 2 or 3 adjacent segments together improves the spatial support, regardless of the segmentation algorithm. This is because objects may contain parts that are very different photometrically (skin and hair on a face) and would never make a coherent segment using bottom-up strategies. The merging appears to be an effective way to address this issue without doing a full exhaustive search.
“Segment soup” is large, but not catastrophically large: The size of the segment
soup that is required to obtain extremely good spatial support can be quite large (around 10,000 segments). However, this is still an order of magnitude less than the number of sliding windows that a Viola-Jones-style approach must examine. Moreover, it appears that using a number of different segmentation strategies together, we can get reasonable performance with as little as 100 segments per image!
In conclusion, this work takes the first steps towards understanding the importance of providing good spatial support for recognition algorithms, as well as offering the practitioner a set of concrete strategies for using existing segmentation algorithms to get the best object support they can.
Wednesday, September 05, 2007
(1) Virtual wall with infrared tower provides it the ability to clean a room exactly than go to the next room.
(2) Soft touch bumpers to slow down velocity when approaching a barrier.
(3) Anti-tangle let Roomba to operate on different surface even on the carpet.
(4) Home based for recharging and docking
3.Easier to use and upgrade with modular design
5.Wireless command center for more conveniently control
Virtual wall for room to room cleaning: