Sunday, August 30, 2009
Saturday, August 29, 2009
Title: Maximum Entropy Inverse Reinforcement Learning
B. D. Ziebart, A. Maas, J. A. Bagnell, and A. K. Dey.
AAAI Conference on Artificial Intelligence (AAAI 2008)
In this work, we develop a probabilistic approach based on the principle of maximum entropy. Our approach providesa well-defined, globally normalized distribution over decision sequences, while providing the same performance guarantees as existing methods.
Thursday, August 27, 2009
Lab Meeting August 31, 2009 (Jim): Acquisition of probabilistic behavior decision model based on the interactive teaching method
Acquisition of probabilistic behavior decision model based on the interactive teaching method
In Proceedings of the Ninth International Conference on Advanced Robotics(ICAR'99), 1999
In this paper, we propose a novel method for mobile robots to acquire new autonomous behaviors gradually based on interaction between human and robots. In this method, behavior decision models are constructed using statistical process for experiences of interaction and teaching, and the robot expresses sureness of its own decision using stochastic reasoning.
The robot not only decides behavior using the sureness, but also makes suggestions and questions for the user using the sureness.
Speaker: Professor Yi Ma, ECE Department, UIUC and Microsoft Research Asia
Time: 2:20pm, Aug 28 (Fri), 2009
Place: Room 101, CSIE building
In the past few years, sparse representation and compressive sensing have arisen as a very powerful and popular framework for signal and image processing. It has armed people with new mathematical principles and computational tools that can effectively and efficiently harness sparse, low-dimensional structures of high-dimensional data such as images and videos. In this talk, we contend that the same principles and tools are equally important for analyzing the meaning and semantics of images and help solve many outstanding problems in computer vision.
As an example, we will focus on the recent success of sparse representation in human face recognition. On one hand, tools from sparse representation such as L1-minimization have seen great empirical success in enhancing the robustness of face recognition with occlusion, illumination change, and registration error, leading to striking recognition performance far exceeding human expectation or capability. On the other hand, the peculiar structures of face images have led to new mathematical discovery of remarkable properties of L1 minimization that far exceed the existing sparse representation theory.
We will also illustrate with many other examples in computer vision the importance of sparsity as a guiding principle for extracting and harnessing the structures of high-dimensional visual data. In return, we will see that overwhelming empirical evidences from those examples suggest that an even richer set of new mathematical results can be developed if we systematically extend the theory of sparse representation to clustering or classification of high-dimensional visual data. The confluence of sparse representation and computer vision is leading us to a brand new mathematical foundation for high-dimensional pattern analysis and recognition.
This is joint work with my former PhD students John Wright, Allen Yang, and Shankar Rao.
Yi Ma is an associate professor at the Electrical & Computer Engineering Department of the University of Illinois at Urbana-Champaign. He is currently on leave as research manager of the Visual Computing group at Microsoft Research Asia in Beijing. His research interests include computer vision, image processing, and systems theory. Yi Ma received two Bachelors’ degree in Automation and Applied Mathematics from Tsinghua University (Beijing, China) in 1995, a Master of Science degree in EECS in 1997, a Master of Arts degree in Mathematics in 2000, and a PhD degree in EECS in 2000, all from the University of California at Berkeley. Yi Ma received the David Marr Best Paper Prize at the International Conference on Computer Vision 1999 and the Longuet-Higgins Best Paper Prize at the European Conference on Computer Vision 2004. He also received the CAREER Award from the National Science Foundation in 2004 and the Young Investigator Award from the Office of Naval Research in 2005. He is an associate editor of IEEE Transactions on Pattern Analysis and Machine Intelligence. He is a senior member of IEEE and a member of ACM, SIAM, and ASEE.
Thursday, August 20, 2009
IEEE Intelligent Systems, July/August 2009, pp. 14–20
Since their codification in 1947 in the collection of short stories I, Robot, Isaac Asimov’s three laws of robotics have been a staple of science fiction. Most of the stories assumed that the robot had complex perception and reasoning skills equivalent to a child and that robots were subservient to humans. Although the laws were simple and few, the stories attempted to demonstrate just how difficult they were to apply in various real-world situations. In most situations, although the robots usually behaved "logically," they often failed to do the "right" thing, typically because the particular context of application required subtle adjustments of judgment on the part of the robot (for example, determining which law took priority in a given situation, or what constituted helpful or harmful behavior).
[The full article]
Lab Meeting August 24, 2009(Chung-Han) : Monitoring an intersection using a network of laser scanners
Huijing Zhao; Jinshi Cui; Hongbin Zha; Katabira, K.; Xiaowei Shao; Shibasaki, R.
Intelligent Transportation Systems, 2008. ITSC 2008. 11th International IEEE Conference on12-15 Oct. 2008 Page(s):428 - 433
Abstract : In this research, a novel system for monitoring an intersection using a network of single-row laser range scanners (subsequently abbreviated as "laser scanner") is proposed. Laser scanners are set on the road side to profile an intersection horizontally from different viewpoints. This is done so that cross sections of the intersection are captured at a high scanning rate (e.g., 37 Hz) and to contain the contour points of the moving objects entering the intersection. Different laser scanners data are integrated into a common spatial-temporal coordinate system and processed. Thus, the moving objects inside the intersection are detected and tracked to estimate their state parameters, such as: location, speed, and direction at each time instance. An experiment was conducted in central Beijing, where six laser scanners were used to cover a three-way intersection. A digital copy of the dynamic intersection was measured, and, through data processing, a large quantity of physical dimension traffic data was obtained.
Wednesday, August 19, 2009
Thursday, August 13, 2009
Title:Moving Obstacle Detection in Highly Dynamic Scenes(ICRA09)
Authors:A. Ess, B. Leibe, K. Schindler, and L. van Gool.
We address the problem of vision-based multipersontracking in busy pedestrian zones using a stereo rigmounted on a mobile platform. Specifically, we are interestedin the application of such a system for supporting pathplanning algorithms in the avoidance of dynamic obstacles.
The complexity of the problem calls for an integrated solution, whichextracts as much visual information as possible and combinesit through cognitive feedback.
We propose such an approach,which jointly estimates camera position, stereo depth, objectdetections, and trajectories based only on visual information.The interplay between these components is represented in agraphical model. For each frame, we first estimate the groundsurface together with a set of object detections. Based onthese results, we then address object interactions and estimatetrajectories. Finally, we employ the tracking results to predictfuture motion for dynamic objects and fuse this informationwith a static occupancy map estimated from dense stereo.
The approach is experimentally evaluated on several longand challenging video sequences from busy inner-city locationsrecorded with different mobile setups. The results show thatthe proposed integration makes stable tracking and motionprediction possible, and thereby enables path planning incomplex and highly dynamic scenes.
Magazine issue 2721.
This is an interesting article mentioning that human beings do not perform metric-SLAM and may perform topological SLAM poorly. Take a look.
Monday, August 10, 2009
IJCAI 2009 Talk: From Low-level Sensors to High-level Intelligence: Activity Recognition Links the Knowledge Food Chain
Author: Qiang Yang, The Hong Kong University of Science and Technology
Sensors provide computer systems with a window to the outside world. Activity recognition "sees" what is in the window to predict the locations, trajectories, actions, goals and plans of humans and objects. Building an activity recognition system requires a full range of interaction from statistical inference on lower level sensor data to symbolic AI at higher levels, where prediction results and acquired knowledge are passed up each level to form a knowledge food chain. In this talk, I will give an overview of activity recognition and explore its relation to other fields, including planning and knowledge acquisition, machine learning and Web search. I will also describe its applications in assistive technologies, security monitoring and mobile commerce.
Sunday, August 09, 2009
Title:Learning Sound Location from a Single Microphone (ICRA 2009)
Authors:AshutoshSaxena and AndrewY. Ng
We consider the problem of estimating the incident angle of a sound, using only a single microphone. The ability to perform monaural (single-ear) localization isimportant to many animals; indeed, monaural cues are also the primary method by which humans decide if a sound comes from the front or back, as well as estimate its elevation. Such monaural localization is made possible by the structure of the pinna (outer ear), which modiﬁes sound in a way that is dependent on its incident angle. In this paper, we propose a machine learning approach to monaural localization, using only a single microphone and an “artiﬁcial pinna” (that distorts sound in a direction-dependent way). Our approach models the typical distribution of natural and artiﬁcial sounds, as well as the direction-dependent changes to sounds induced by the pinna. Our experimental results also show that the algorithm is able to fairly accurately localize a wide range of sounds, such as human speech, dog barking, waterfall, thunder, and so on. In contrast to microphone arrays, this approach also offers the potential of signiﬁcantly more compact, as well as lower cost and power, devices for sounds localization.
Friday, August 07, 2009
IEEE Computer Graphics and Applications, July/August 2009, pp. 54–63
Crowd simulation is enjoying considerable success in a number of applied domains, most notably in evacuation scenarios in which simulated crowd behaviors can help improve the safety of interior building designs. However, not all applications involving virtual populace have the overarching goal of realistic simulation. In many cases, it's necessary only that viewers perceive the crowd as realistic. In many of the latest movies or video games involving large numbers of virtual actors, liberties can be taken in displaying those far away or otherwise obscured from the eye, if this doesn't noticeably diminish the viewing experience. For example, such simulations can reduce the level of detail or forgo collision avoidance calculations to allow simulation of a larger crowd or enhanced behaviors for individuals deemed most likely to occupy viewers' attention. (Full PDF)
Wednesday, August 05, 2009
Tuesday, August 04, 2009
Authors: Jingu Heo, Marios Savvides
In this paper we propose a novel method of generating
3D Morphable Models (3DMMs) from 2D images. We
develop algorithms of 3D face reconstruction from a
sparse set of points acquired from 2D images. In order
to establish correspondence between images precisely, we
Combined Active Shape Models (ASMs) and Active Appearance
Models (AAMs)(CASAAMs) in an intelligent way,
showing improved performance on pixel-level accuracy
and generalization to unseen faces. The CASAAMs are
applied to the images of different views of the same person
to extract facial shapes across pose. These 2D shapes are
combined for reconstructing a sparse 3D model. The point
density of the model is increased by the Loop subdivision
method, which generates new vertices by a weighted sum
of the existing vertices. Then, the depth of the dense 3D
model is modified with an average 3D depth-map in order
to preserve facial structure more realistically. Finally, all
249 3D models with expression changes are combined
to generate a 3DMM for a compact representation. The
first session of the Multi-PIE database, consisting of 249
persons with expression and illumination changes, is used
for the modeling. Unlike typical 3DMMs, our model can
generate 3D human faces more realistically and efficiently
(2-3 seconds on P4 machine) under diverse illumination
Sunday, August 02, 2009
And try to propose some methods to slove this problem.