This Blog is maintained by the Robot Perception and Learning lab at CSIE, NTU, Taiwan. Our scientific interests are driven by the desire to build intelligent robots and computers, which are capable of servicing people more efficiently than equivalent manned systems in a wide variety of dynamic and unstructured environments.
Saturday, April 29, 2006
CMU RI FRC Seminar: Teaching a Robot to Avoid Obstacles
Speaker: Bradley Hamner, FRC Staff / Masters Student, Robotics Institute, Carnegie Mellon University
Date: Thursday, May 4, 2006
Time: Noon
Location: NSH 1109
Abstract:
Many obstacle avoidance methods have been presented in the literature, all of which rely on tuning a set of parameters to a control function. Frequently, programmers tune gains by hand until the robot behaves as desired, a nonintuitive and frustrating process. In this talk I will present a method of learning the gains of an obstacle avoidance system automatically by observing how a human operator manually drives the vehicle. I will present an obstacle avoidance algorithm, and its parameters, and show how parameters learned by our method outperform parameters which were hand-tuned. I will also show preliminary results from learning the parameters for multiple vehicles which perform in different environments.
Speaker Bio:
Brad Hamner received a B.S. in Mathematics from Carnegie Mellon University in 2002. Since then he has worked as a staff member in the Field Robotics Center. He entered the Robotics Institute masters program in 2005. His research interests include mobile robot navigation and obstacle avoidance.
CMU RI FRC Seminar: Navigation Autonomy for Legged Machines
Speaker: James Kuffner, Assistant Professor, Robotics Institute, Carnegie Mellon University
Date: Thursday, April 27, 2006
Time: Noon
Location: NSH 1109
Abstract:
Legged robots are complex dynamic systems whose technology has evolved rapidly during the past decade. Presently, several companies are developing commercial prototype biped and quadruped robots. In this talk, I will present research aimed at improving the autonomy of legged robots through the development of practical motion planning algorithms that can be applied in dynamic unstructured environments. Specifically, I will discuss footstep placement planning over rough terrain, our "intelligent joystick" design for semi-autonomous control, and navigation among movable obstacles (NAMO). Experimental results obtained by implementations running on Honda's ASIMO, the AIST HRP2 humanoid, the H7 Humanoid (U. Tokyo), and the Boston Dynamics Little Dog quadruped robot will be shown.
Speaker Bio:
James Kuffner is an Assistant Professor at the Robotics Institute, Computer Science Dept., Carnegie Mellon University. He received a B.S. and M.S. in Computer Science from Stanford University in 1993 and 1995, and a Ph.D. from the Stanford University Dept. of Computer Science Robotics Laboratory in 1999. He was a Japan Society for the Promotion of Science (JSPS) Postdoctoral Research Fellow at the University of Tokyo from 1999 to 2001. He joined the faculty at Carnegie Mellon University in May 2002. His research interests include robotics, motion planning, and computer graphics and animation.
Friday, April 28, 2006
Computer scientists at Sheffield Hallam University, UK, have developed a new face recognition software which can produce an exact 3D image of a face within 40 milliseconds. A pattern of light is projected on your face, creating a 2D image, from which an accurate 3D representation is generated. This technology should speed airport check-ins, but it could also be used in banks or for checking ID cards as it allows full identification in less than one second.
This technology was developed at Sheffield Hallam University by the Geometric Modelling and Pattern Recognition Research Group of the Materials and Engineering Research Institute (MERI).
Here is what MERI Professor Marcos Rodrigues says about this new technology.
"This technology could be used anywhere there is a need for heightened security. It is well suited to a range of applications including person identification from national databases, access control to public and private locations, matching 3D poses to 2D photographs in criminal cases, and 3D facial biometric data for smart cards such as ID and bank cards. We have developed a viable, working system at the cutting edge of 3D technology."
Below are two screenshots showing the technology at work. (Credit: MERI)
These two screenshots have been extracted from a short movie available in different formats from this page about 3D Imaging at MERI.
But why similar systems have failed until now? The answer is provided by an article from Vision Systems Design, "Imaging technology may speed airport check-in."
Other 3-D systems, requiring 16 shots of the face, have proved unworkable because of the time it takes to construct a picture. The chance of movement during such a multishot process is extremely high, and if the face moves even a fraction then the 2-D to 3-D image is unworkable.
This is where the MERI's technology brings something new, including its accuracy -- and its low cost.
MERI also claims several other advantages for its technology. Hardware requirements are a projector and a single camera, making setup inexpensive--a few hundred pounds, compared with up to £40,000 for older systems. These need at least three or four cameras to capture an image, which means time-consuming parameters and complex calibrations.
Besides airports and banks, this technology could be used for industrial applications.
"Objects can go on a conveyor belt, and, instead of using a flat image, a 3-D image can help locate defects in them. Although we are focusing on security applications now, there is great potential in the future," said Rodrigues.
I sure hope that this system will go through extensive tests before being adopted.
Sources: Sheffield Hallam University news release, February 20, 2006; Vision Systems Design, February 27, 2006; and various web sites
Here is the link of the demo video
http://www.shu.ac.uk/research/meri/gmpr/projects/projects1a.html
Glasses that hear well
Glasses that hear well
If you live in the Netherlands and don't hear well, you'll soon be able to buy a new hearing aid, a pair of Varibel glasses. These special glasses originally developed at Delft University of Technology have four small interconnected microphones in each leg of their frames. And these microphones can "selectively intensify the sounds that come from the front, while dampening the surrounding noise." So these glasses offer a better sound quality than other hearing aids.
Before going further, here is a picture of these hearing glasses with their tiny microphones (Credit: Varibel).
Now, why people using current hearing aids are not satisfied?
Many hearing aids intensify sounds from all directions. The result is that people hear noise, but not the people they are speaking to. Because people have such difficulty understanding what others are saying, many people -- in spite of their hearing aid -- have less social contact with others or must retire from their jobs earlier than desired. The hearing-glasses can provide a solution to this problem, say the experts and users who have tried and tested the Varibel.
So what is the solution brought by Varibel?
The Varibel cannot be compared to traditional hearing aids. In each leg of the glass' frame there is a row of four tiny, interconnected microphones, which selectively intensify the sounds that come from the front, while dampening the surrounding noise. The result is a directional sensitivity of +8.2 dB. In comparison, regular hearing aids have a maximum sensitivity of +4 dB. With this solution, the user can separate the desired sounds from the undesired background noise.
Below is a picture of a full Varibel package as you'll be able to buy before the end of April 2006 (Credit: Varibel).
And will it be of good help in public places?
Martin de Jong, audio-technician, says: "With the Varibel, the natural sounds that people enjoy are retained. This works surprisingly well. People can hear good and at the same time clearly – and especially in rooms such as in a cafe or at a birthday party."
For more information, you can visit the Varibel web site -- if you read Dutch.
Sources: Delft University of Technology news release, April 7, 2006; and various web sites
LinkFinding a Better Way to Quiet Noisy Environments
“Noise cancellation is a hidden technology that most consumers aren’t aware of, but vehicles made by BMW, Mercedes, Honda, and other companies are now using it,” said Raymond de Callafon, co-author of the paper and a professor of mechanical and aerospace engineering at UCSD’s Jacobs School of Engineering. “Our new technique should greatly expand the potential of active noise-cancellation technologies.”
Basic active noise-cancellation is composed of four inter-related parts: a microphone that measures incoming noise and feeds that information to a computer, a computer processor that converts the noise information into anti-noise instructions, and an audio speaker that is driven by the anti-noise signal to broadcasts sound waves that are exactly 180 degrees out of phase with the unwanted signal and of the same magnitude. In addition, a downstream microphone monitors residual noise and signals the computer as part of a process to optimize the anti-noise signal.
This “feedforward” active-noise control can reduce unwanted helicopter and cabin noise or the steady roar of industrial air handling systems by 40 decibels or more. However, most commercial systems suffer from acoustic feedback because the anti-noise signal produced by the noise-cancellation speakers can feed back into the microphone and become amplified repeatedly until the resulting sound becomes an ear-splitting squeal or whistle.
“Most people ignore this acoustic coupling but we took it into account and designed the feedforward noise cancellation knowing that the acoustic coupling is there,” said de Callafon.
Some makers of active noise cancellation avoid acoustic coupling by shielding microphones from speakers, or by using directional microphones and speakers that are pointed away from each other. “This works fine in the case of noise-reduction headphones and air-conditioning ducts, but it’s impractical in hundreds of other applications,” de Callafon said.
For example, the algorithm developed by de Callafon and Ph.D. candidate J. Zeng may be adapted to cancel unwanted complex signals that are moving, such as the sound of bustling urban traffic coming through a ventilation opening.
“We think we’ve developed a totally new approach that works by generating the ‘feedforward’ noise cancellation signals and adaptively changing them in the presence of acoustic coupling,” de Callafon said. “This has been a really complicated problem to solve and we think the approach we’ve taken will have a significant impact on the field.”
Source: University of California, San Diego
Link
Center for the Foundations of Robotics Seminar, April 26, 2006: Methodology for Design and Analysis of Physically Cooperating Mobile Robots
Ashish D Deshpande
Time and Place:
Newell Simon Hall 1507
Refreshments 4:45 pm
Talk 5:00 pm
Abstract:
A team of small, low-cost robots instead of a single large, complex robot is useful in operations such as search and rescue, urban exploration etc. However, the performance of such a team is limited due to the restricted mobility of the team members. The first part of my talk will present the results obtained toward the goal of enhancing mobility of a team of mobile robots by physical cooperation among the robots. We have carried out static as well as dynamic analysis of cooperating mobile robot system and developed 2-robot hardware to demonstrate cooperative behaviors.
There is a need to develop a methodology to design and analyze cooperative maneuvers involving multiple mobile robots. The second part of my talk will present our efforts toward the development of such a methodology. Our approach is to treat the linked mobile robots as a multiple degree-of-freedom object, comprising an articulated open kinematic chain, which is being manipulated by pseudo robots (p-robots) at the ground interaction points. Such rearrangement of the problem facilitates the adaptation of ideas from the cooperative manipulation literature. We present the new methodology by carrying out static as well as dynamic analysis for a 2-robot cooperation case with the new methodology. Also, we have demonstrated that introduction of redundant actuation, by an additional (third) robot, can help in improving the friction requirements. We also present our ideas for employing this newly designed methodology to analyze other interesting multi-body robotic systems.
Bio
Ashish Deshpande is a doctoral candidate under Dr. Jonathan Luntz in the Mechanical Engineering Dept. at the University of Michigan, Ann Arbor. His areas of interest include mobile robotics, multi-body dynamics, controls and engineering design. Ashish has recived B.E. from VNIT, Nagpur, India in 1999 and M.S. from the University of Massachusetts, Amherst in 2002.
http://www.cs.cmu.edu/~cfr/talks/2006-Apr-26.html
Thursday, April 27, 2006
Detecting and tracking multiple interacting objects without class-specific models
Bose, Biswajit
Wang, Xiaogang
Grimson, Eric
Abstract:
We propose a framework for detecting and tracking multiple interacting objects from a single, static, uncalibrated camera. The number of objects is variable and unknown, and object-class-specific models are not available. We use background subtraction results as measurements for object detection and tracking. Given these constraints, the main challenge is to associate pixel measurements with (possibly interacting) object targets. We first track clusters of pixels, and note when they merge or split. We then build an inference graph, representing relations between the tracked clusters. Using this graph and a generic object model based on spatial connectedness and coherent motion, we label the tracked clusters as whole objects, fragments of objects or groups of interacting objects. The outputs of our algorithm are entire tracks of objects, which may include corresponding tracks from groups of objects during interactions. Experimental results on multiple video sequences are shown.
Link
Tuesday, April 25, 2006
CMU RI Special FRC Seminar: Optimal Rough Terrain Trajectory Generation for Wheeled Mobile Robots
Date: *Tuesday*, April 25, 2006
Time: **3pm** (not Noon)
Location: NSH 1109
Refreshments will be served
Speaker: Thomas Howard, PhD Candidate, Robotics Institute, Carnegie Mellon University
Abstract:
In order to operate competently in any environment, a mobile robot must understand the effects of its own dynamics and of its interactions with the terrain. It is therefore natural to incorporate models of these effects in a trajectory generator which determines the controls necessary to achieve motion between a prescribed set of boundary states. This talk addresses recent work in developing a general algorithm for continuous motion primitive trajectory generation for arbitrary vehicle models on rough three dimensional terrain. The generality of the method derives from linearizing and inverting forward models of propulsion, suspension, and motion to minimize boundary state error and path cost given a parameterized set of controls. The simulation-based approach can accommodate effects such as rough terrain, wheel slip, and predictable vehicle dynamics. We will present this algorithm for local motion planning and discuss applications in planetary rovers and unmanned ground vehicles.
Related Links:
- Terrain-Adaptive Generation of Optimal Continuous Trajectories for Mobile Robots, T. Howard and A. Kelly, Proceedings of the 8th International Symposium on Artificial Intelligence, Robotics, and Automation in Space (i-SAIRAS '05), September, 2005.
- Trajectory Generation on Rough Terrain Considering Actuator Dynamics, T. Howard and A. Kelly, Proceedings of the 5th International Conference on Field and Service Robotics (FSR '05), July, 2005.
[Robot Perception and Learning] PAL lab meeting 27, April, 2006 (Casey) Robutst Real-Time Face Detection
From: International Journal of Computer Vision 2004
Abstract:
This paper describes a face detection framework that is capable of processing images extremely rapidly
while achieving high detection rates. There are three key contributions. The first is the introduction of a new
image representation called the “Integral Image” which allows the features used by our detector to be computed
very quickly. The second is a simple and efficient classifier which is built using the AdaBoost learning algorithm
(Freund and Schapire, 1995) to select a small number of critical visual features from a very large set of
potential features. The third contribution is a method for combining classifiers in a “cascade” which allows background
regions of the image to be quickly discarded while spending more computation on promising face-like
regions. A set of experiments in the domain of face detection is presented. The system yields face detection performance
comparable to the best previous systems (Sung and Poggio, 1998; Rowley et al., 1998; Schneiderman and
Kanade, 2000; Roth et al., 2000). Implemented on a conventional desktop, face detection proceeds at 15 frames per
second.
Paper link: http://www.vision.caltech.edu/html-files/EE148-2005-Spring/pprs/viola04ijcv.pdf
CMU Thesis Proposal : Face View Synthesis Using A Single Image (3 May 2006)
Robotics Institute
Carnegie Mellon University
Abstract
Face view synthesis involves using one view of a face to artificially render another view. It is an interesting problem in computer vision and computer graphics, and can be applied in the entertainment industry such as animated movies or video games. The fact that the input is only a single image, makes the problem very difficult. Previous approaches perform machine learning on pair of poses from 2D training data and then predict the unknown pose in the test example. Such 2D approaches are much more practical than approaches requiring 3D data and more computationally efficient. However they perform inadequately when dealing with large angles between poses. In this proposal we seek to improve performance through better choices in probabilistic modeling. As a first step, we have implemented a statistical model combining distance in feature space (DIFS) and distance from feature space (DFFS) for such pair of poses. Such a representation leads to better performance. Furthermore, we have observed that statistical dependency varies among different groupings of pixels. In particular, a given pixel variable is often statistically correlated with only a small number of other pixel variables. We propose to exploit this statistical structuring by modeling the synthesis problem using graphical probability models. Such representations concisely describe the synthesis problem, providing a rich model with reduced susceptibility to over-fitting.
More detail : http://www.cs.cmu.edu/~jiangni/thesis/jiang_proposal.pdf
Monday, April 24, 2006
PAL lab meeting 27, April, 2006 (Any) Solving Partially Observable Markov Decision Processes
Abstract:
This paper describes the POMDP framework and presents some well-known results from the field. It then presents a novel method called the witness algorithm for solving POMDP problems and analyzes its computational complexity. We argue that the witness algorithm is superior to existing algorithms for solving POMDP's in an important complexity-theoretic sense.
Outlines:
- Introduction to MDP
- CO(Completely Observable)-MDP vs. POMDP
- Definition of POMDP
- Solving POMDP
- POMDP Value Iteration
Tuesday, April 18, 2006
PAL lab meeting 20th,April,2006(Stanley) CMU RI Thesis Oral: Visual Feedback Manipulation for Hand Rehabilitation in a Robotic Environment
18 April 2006
Abstract: In this thesis, I examine how manipulations of the visual feedback given to a patient can be used to make robotic therapy more effective than traditional human-assisted therapy and previous robotic rehabilitation applications. Patients may not strive for difficult goals in therapy due to entrenched habits or personality variables such as low self-efficacy or a fear of failure. Visual feedback manipulation can be used to encourage patients to move beyond an established level of performance. Specifically, I examine two types of visual feedback manipulation: visual distortion and visual progression. By “visual progression,” I mean veridical visual feedback emphasizing and encouraging gradual improvements in performance; by “visual distortion,” I mean visual feedback that establishes a metric of performance for a given rehabilitation task and then gradually changes this metric such that improved performance is required for the same visual response. For a therapeutic program involving distortion to be most effective, patients must not detect the visual distortions. Thus, the first set of experiments I conducted addressed the limits of imperceptible visual distortion with unimpaired subjects. Further experiments with unimpaired subjects were conducted to show that vision dominates kinesthetic feedback in our robotic rehabilitation environment and that gradual visual distortion can be used to control force production and movement distance within a single experimental session. I also examined the effects of distortion during a difficult two-finger coordination task. Based on this work, I designed paradigms applying visual feedback manipulation to the rehabilitation of chronic stroke and traumatic brain injury patients. I performed initial tests with three patients, each of whom participated in a 6-week rehabilitation protocol. Patients' performances during the initial assessment at each therapeutic session were found to be an underestimate of their actual abilities and a poor metric for setting the difficulty level of therapeutic exercise. All three patients were willing and able to improve their performance by following distortion or progression, and all patients showed functional improvements after participation in the study. Visual feedback manipulation may provide a way to help a patient move beyond his or her self-assessed “best” performance, improving the outcome of robotic rehabilitation.
http://www.cs.cmu.edu/~broberts/Dissertation.pdf
PAL lab meeting 20th,April,2006(Vincent) Face recognition using eigenfaces
Matthew Turk and Pentland A.P.
Media Lab., MIT, Cambridge, MA, USA ;
This paper appears in:
Computer Vision and Pattern Recognition, 1991. Proceedings CVPR '91., IEEE Computer Society Conference on
Publication Date: 3-6 June 1991
Abstract :
An approach to the detection and identification of human faces is presented, and a working, near-real-time face recognition system which tracks a subject's head and then recognizes the person by comparing characteristics of the face to those of known individuals is described. This approach treats face recognition as a two-dimensional recognition problem, taking advantage of the fact that faces are normally upright and thus may be described by a small set of 2-D characteristic views. Face images are projected onto a feature space (`face space') that best encodes the variation among known face images. The face space is defined by the `eigenfaces', which are the eigenvectors of the set of faces; they do not necessarily correspond to isolated features such as eyes, ears, and noses. The framework provides the ability to learn to recognize new faces in an unsupervised manner
Here is the link of this paper.
Saturday, April 15, 2006
CNN: Snake robots could aid in rescues
'Breadstick' and 'Pepperoni' are being tested
Wednesday, April 12, 2006; Posted: 9:20 p.m. EDT (01:20 GMT)
PITTSBURGH, Pennsylvania (AP) -- For most people, snakes seem unpleasant or even threatening. But Howie Choset sees in their delicate movements a way to save lives.
The 37-year-old Carnegie Mellon University professor has spent years developing snakelike robots he hopes will eventually slither through collapsed buildings in search of victims trapped after natural disasters or other emergencies.
In recent weeks, Choset and some of his students made what he said was an industry breakthrough: enabling the articulated, remote-controlled devices to climb up and around pipes.
Full Article
Friday, April 14, 2006
CMU VASC talk: 3D Photography: Reconstructing Photorealistic 3D Models of Large-Scale Scenes
Monday, April 17, 2006
Abstract:
Recently there has been an increased interest in the photorealistic modeling and rendering of large-scale scenes, such as urban structures. This requires a fusion of range sensing technology and traditional digital photography. A major bottleneck in this process is the automated registration of a large number of geometrically complex 3D range scans and high-resolution 2D images in a common frame of reference. In this talk we will present a novel system that integrates automated 3D registration techniques with multiview geometry for texture mapping 2D images onto 3D range data. Our methods utilize range segmentation and feature extraction algorithms. We will also describe our approach in 3D mesh generation. The produced 3D representations are useful for urban planning, historical preservation, or entertainment applications. We will present results of scanning large urban structures, such as the interior of the Grand Central Terminal in New York.
Bio: Ioannis Stamos is an associate professor of computer science and director of the Vision & Graphics Laboratory at Hunter College of the City University of New York (2001-present). He is also a member of the doctoral faculty of the Graduate Center of CUNY. His research interests include 3D segmentation, range to image registration and 3D modeling. Stamos received a PhD, an MPhil and an MS in computer science from Columbia University. He received an Engineering Diploma in computer engineering & informatics from the University of Patras, Greece. Stamos is a recipient of the Faculty Early Career Development Award (CAREER) by the National Science Foundation.
CMU ML talk: Bayesian Inference for Gaussian Mixed Graph Models
http://www.cs.cmu.edu/~rbas
Date: April 17
Abstract: We introduce priors and algorithms to perform Bayesian inference in Gaussian models defined by acyclic directed mixed graphs. Such a class of graphs, composed of directed and bi-directed edges, is a representation of conditional independencies that is closed under marginalization and arises naturally from causal models which allow for unmeasured confounding. Monte Carlo methods and a variational approximation for such models are presented. Our algorithms for Bayesian inference allow the evaluation of posterior distributions for several quantities of interest, including causal effects that are not identifiable from data alone but could otherwise be inferred where informative prior knowledge about confounding is available.
Joint work with Zoubin Ghahramani
Thursday, April 13, 2006
What's New @ IEEE April 2006
"IEEE Spectrum" has issued its fourth annual list of the top 10 tech cars. The article focuses on production cars now in showrooms or soon to be available, but this year also singles out three concept cars for special mention. Cars on this year's list include the 2006 Chrysler Heritage Edition, whose headlights automatically switch to low beams when the car detects approaching vehicles and the 2007 Mercedes-Benz E 320 Bluetec, which will have the cleanest diesel engine on the planet. Read more: http://www.spectrum.ieee.org/apr06/3173
3. PROJECT SEEKS SOLUTIONS FOR FUTURE OF WIRELESS NETWORKS
With an increasing amount of embedded wireless sensor technology being developed, system designers are now faced with the challenge of deciding the best direction to take for future research so that the full capabilities of the networks can be realized. As a result, the European Commission's Information Society Technologies project Embedded WiseNts is focusing on finding solutions to the problems associated with the production of Wireless Sensor Networks and their applications, particularly in the form of Cooperative Objects. The team's goal is to acquire a general vision of these networks and predict technical progress over the next 10 years. The project will conclude in December 2006 and team members have already identified several key areas of weaknesses, including the lack of a middleware layer for the adaptation of diverse application software and the need for better energy efficiency in both hardware and software. Read more: http://www.eurekalert.org/pub_releases/2006-03/ir-ptr032706.php
4. SMARTPHONES NOW AND IN THE FUTURE
A new report appearing in "IEEE Distributed Systems Online" (v. 7, no. 3) discusses what makes a cellphone a smartphone and looks at the future of the market. According to the article, smartphones are broken into three categories -- high-end phones, PDAs, and enhanced wireless email devices such as Blackberrys. The components that comprise them, such as internal memory, location-based services, and screen display, are common on all, but differ slightly depending on model. For instance, some use SVGA screens while others still use VGAs. Their operating systems consist mostly of Windows-based and Linux-based systems, with Symbian OS considered the leader. As these technologies improve, and WiFi hot spots increase worldwide, users can expect to find more location-specific services, especially in the realm of commerce programs that will cater to shopping centers. Additionally, M-commerce, the ability to use a phone to pay for items, is also something software developers are trying streamline. Read more: the link
6. TELEPHONY'S NEXT ACT: "IEEE SPECTRUM" REPORTS
Will Voice Over Internet Protocol wreak havoc with the systems of the Internet, or will it make our lives easier and better? Folding traditional telephony into the Internet is tricky, according to an article in this month's issue of "IEEE Spectrum" magazine. Their hardware and software are different and, perhaps hardest of all, today they involve totally different databases. The thing they do most differently is called signaling -- keeping track of all of the potential communicating parties, their equipment, and their services, and selecting the right combination for each contact. The next seven years will be key. Read more:
http://www.spectrum.ieee.org/apr06/3204
Wednesday, April 12, 2006
MIT CSAIL talk: Functional Specificity in the Cortex: Selectivity, Experience, & Generality
Functional MRI has revealed several cortical regions in the ventral visual pathway in humans that exhibit a striking degree of functional specificity: the fusiform face area (FFA), parahippocampal place area (PPA), and extrastriate body area (EBA). I will briefly review this work and then discuss more recent studies that investigate the specificity, origins, and generality of domain specificity in the cortex. In particular these studies ask i) how specialized is the FFA for faces and what exactly it does with faces?, ii) how do cortical responses to visually presented objects change with experience and is extensive experience ever sufficient to create them?, and iii) are domain specific regions of cortex found only in the visual system, or can they sometimes be found for very abstract high-level cognitive functions as well?
Monday, April 10, 2006
PAL Lab Meeting 4/13(Eric): 3D Scanner Demo
1. Projector-camera system calibration.
2. How to compute the object depth.
3. Demo.
PAL Lab Meeting 4/13(ChiHao): Robust Speaker's Location Detection in a Vehicle Environment Using GMM Models
Author: Jwu-Sheng Hu, Member, IEEE, Chieh-Cheng Cheng, and Wei-Han Liu
Abstract:
Human–computer interaction (HCI) using speech communication is becoming increasingly important, especially in driving where safety is the primary concern. Knowing the speaker's location (i.e., speaker localization) not only improves the enhancement results of a corrupted signal, but also provides assistance to speaker identification. Since conventional speech localization algorithms suffer from the uncertainties of environmental complexity and noise, as well as from the microphone mismatch problem, they are frequently not robust in practice. Without a high reliability, the acceptance of speech-based HCI would never be realized. This work presents a novel speaker's location detection method and demonstrates high accuracy within a vehicle cabinet using a single linear microphone array. The proposed approach utilize Gaussian mixture models (GMM) to model the distributions of the phase differences among the microphones caused by the complex characteristic of room acoustic and microphone mismatch. The model can be applied both in near-field and far-field situations in a noisy environment. The individual Gaussian component of a GMM represents some general location-dependent but content and speaker-independent phase difference distributions. Moreover, the scheme performs well not only in nonline-of-sight cases, but also when the speakers are aligned toward the microphone array but at difference distances from it. This strong performance can be achieved by exploiting the fact that the phase difference distributions at different locations are distinguishable in the environment of a car. The experimental results also show that the proposed method outperforms the conventional multiple signal classification method (MUSIC) technique at various SNRs.
Link
Saturday, April 08, 2006
Stanford AI talk: Toward a Geometrically Coherent Image Interpretation
April 10, 2006, 3:15PM (NOT 4:15PM)
http://graphics.stanford.edu/ba-colloquium/
Abstract
Image interpretation, the ability to see and understand the three-dimensional world behind a two-dimensional image, goes to the very heart of the computer vision problem. The ultimate objective is, given an image, to automatically produce a coherent interpretation of the depicted scene. This requires not only recognizing specific objects (e.g. people, houses, cars, trees), but understanding the underlying structure of the 3D scene where these objects reside.
In this talk I will describe some of our recent efforts toward this lofty goal. I will present an approach for estimating the coarse geometric properties of a scene by learning appearance-based models of geometric classes. Geometric classes describe the 3D orientation of image regions with respect to the camera. This geometric information is then combined with camera viewpoint estimation and local object detection producing a prototype for a coherent image-interpretation framework.
Joint work with Derek Hoiem and Martial Hebert at CMU.
MIT CSAIL Thesis Oral: Multi-Stream Speech Recognition: Theory and Practice
Date: Monday, April 10 2006
In this thesis, we have focused on improving the acoustic modeling of speech recognition systems to increase the overall recognition performance. We formulate a novel multi-stream speech recognition framework using multi-tape finite-state transducers (FSTs). The multi-dimensional input labels of the multi-tape FST transitions specify the acoustic models to be used for the individual feature streams. An additional auxiliary field is used to model the degree of asynchrony among the feature streams. The individual feature streams can be linear sequences such as fixed-frame-rate features in traditional hidden Markov model (HMM) systems, and the feature streams can also be directed acyclic graphs such as segment features in segment-based systems. In a single-tape mode, this multi-stream framework also unifies the frame-based HMM and the segment-based approach.
Systems using the multi-stream speech recognition framework were evaluated on an audio-only and an audio-visual speech recognition task. On the Wall Street Journal speech recognition task, the multi-stream framework combined a traditional frame-based HMM with segment-based landmark features. The system achieved word error rate (WER) of 8.0%, improved from both the WER of 8.8% of the baseline HMM-only system and the WER of 10.4% of the landmark-only system. On the AV-TIMIT audio-visual speech recognition task, the multi-stream framework combined a landmark model, a segment model, and a visual HMM. The system achieved a WER of 0.9%, which also improved from the baseline systems. These results demonstrate the feasibility and versatility of the multi-stream speech recognition framework.
Thesis Supervisor: James R. Glass
Committee: Victor Zue, Michael Collins, Herb Gish
MIT CSAIL talk: Steps Toward the Creation of a Retinal Implant for the Blind
Date: Monday, April 10 2006
Abstract: This talk describes the efforts at MIT and the Massachusetts Eye and Ear Infirmary over the past 15 years to develop a chronically implantable retinal prosthesis. The goal is to restore some useful level of vision to patients suffering from outer retinal diseases, primarily retinitis pigmentosa and macular degeneration. We initially planned to build an intraocular implant, wirelessly supplied with signal and power, to stimulate the surviving cells of the retina. In this design electrical stimulation is applied through an epiretinal microelectrode array attached to the inner (front) surface of the retina. We have carried out a series of six acute surgical trials on human volunteers (five of whom were blind with retinitis pigmentosa and one with normal vision and cancer of the orbit) to assess electrical thresholds and the perceptions resulting from epiretinal retinal stimulation. The reported perceptions often corresponded poorly to the spatial pattern of the stimulated electrodes. In particular, no patient correctly recognized a letter. We hope that chronically implanted patients will adapt over time to better interpret the abnormal stimuli supplied by such a prosthesis.
Experiences with both animals and humans exposed surgical, biocompatibility, thermal and packaging difficulties with this epiretinal approach. Two years ago we altered our approach to a subretinal design which will, we believe, reduce these difficulties. Our current design places essentially the entire bulk of the implant on the temporal outer wall of the eye, with only a tiny sliver of the 10 micron thick microelectrode array inserted through a scleral flap beneath the retina. In this design the entire implant lies in a sterile area behind the conjunctiva. We plan to have a wireless prototype version of this design ready for chronic animal implantation this Spring.
about posting your talk...
This is regarding posting your talk on the blog. As "my talk this week" does not say much, please use this format "PAL Lab Meeting date(Name): talk title". For instance, PAL Lab Meeting 4/13(ChiHao): Super Microphone Array Localization.
Best,
-Bob
CMU RI Thesis Oral: Visual Feedback Manipulation for Hand Rehabilitation in a Robotic Environment
18 April 2006
Abstract: In this thesis, I examine how manipulations of the visual feedback given to a patient can be used to make robotic therapy more effective than traditional human-assisted therapy and previous robotic rehabilitation applications. Patients may not strive for difficult goals in therapy due to entrenched habits or personality variables such as low self-efficacy or a fear of failure. Visual feedback manipulation can be used to encourage patients to move beyond an established level of performance. Specifically, I examine two types of visual feedback manipulation: visual distortion and visual progression. By “visual progression,” I mean veridical visual feedback emphasizing and encouraging gradual improvements in performance; by “visual distortion,” I mean visual feedback that establishes a metric of performance for a given rehabilitation task and then gradually changes this metric such that improved performance is required for the same visual response. For a therapeutic program involving distortion to be most effective, patients must not detect the visual distortions. Thus, the first set of experiments I conducted addressed the limits of imperceptible visual distortion with unimpaired subjects. Further experiments with unimpaired subjects were conducted to show that vision dominates kinesthetic feedback in our robotic rehabilitation environment and that gradual visual distortion can be used to control force production and movement distance within a single experimental session. I also examined the effects of distortion during a difficult two-finger coordination task. Based on this work, I designed paradigms applying visual feedback manipulation to the rehabilitation of chronic stroke and traumatic brain injury patients. I performed initial tests with three patients, each of whom participated in a 6-week rehabilitation protocol. Patients' performances during the initial assessment at each therapeutic session were found to be an underestimate of their actual abilities and a poor metric for setting the difficulty level of therapeutic exercise. All three patients were willing and able to improve their performance by following distortion or progression, and all patients showed functional improvements after participation in the study. Visual feedback manipulation may provide a way to help a patient move beyond his or her self-assessed “best” performance, improving the outcome of robotic rehabilitation.
Further Details
A copy of the thesis oral document can be found at http://www.cs.cmu.edu/~broberts/Dissertation.pdf.
Wednesday, April 05, 2006
PAL Lab meeting (Tailion 4/5): slammot demo
I'll show my slammot demo tomorrow.
In this demo, I could show you latest version of slammot interface and the problems I met.
-tailion
My talk this week
I will show a program tracking vehicle in NTU using Kalman Filter.
Tuesday, April 04, 2006
CMU VASC talk: Visual patterns with matching subband statistics and higher order image
Abstract: Statistical representations of visual patterns are commonly used in computer vision. One such representation is a distribution measured from the output of a bank of filters (Gaussian, Laplacian, Gabor, wavelet etc). Both marginal and joint distributions of filter responses have been advocated and effectively used for a variety of vision tasks.
We begin by examining the ability of these representations to discriminate between an arbitrary pair of visual stimuli. Examples of patterns are derived that possess the same statistical properties, yet are "visually distinct." The existence of these patterns suggests the need for more powerful early visual representations.
It has been argued that the primary role of early vision is the modeling of statistical redundancy in natural imagery. One of the most striking properties of images is scale invariance. In the second part of this talk, this property is examined and a novel image representation, the higher order pyramid, is introduced. The representation is tuned to the scale invariant properties of images and constitutes a form of "higher order signal whitening."
BIO: Joshua Gluckman received the BS degree in economics from the University of Virginia (1992), the MS degree in computer science from the College of William and Mary (1995), and the PhD degree in computer science from Columbia University (2000). Since 2001, he has held the position of assistant professor of computer science at Polytechnic University in Brooklyn, NY. His area of research is computer vision.