Tuesday, February 14, 2006

MIT Thesis Defense: Learning a Dictionary of Shape-Components in Visual Cortex: Comparison with Neurons, Humans and Machine

Speaker: Thomas Serre , Dept. of Brain & Cognitive Sciences and McGovern Institute for Brain Research
Date: Wednesday, February 15 2006
Host: Prof. Tomaso Poggio, McGovern Institute for Brain Research
Relevant URL: http://web.mit.edu/serre/www/

In this talk I will describe a quantitative model that accounts for the circuits and computations of the feedforward path of the ventral stream of visual cortex. This model is consistent with a general theory of visual processing that extends the hierarchical model of Hubel & Wiesel from primary to extrastriate visual areas and attempts to explain the first few hundred milliseconds of visual processing. One of the key elements in the approach I will describe is the learning of a generic dictionary of shape-components from V2 to IT, which provides an invariant representation to task-specific categorization circuits in higher brain areas. This vocabulary of shape-tuned units is learned in an unsupervised manner from natural images, and constitutes a large and redundant set of image features with different complexities and invariances. This theory significantly extends an earlier approach by Riesenhuber & Poggio (1999) and builds upon several existing neurobiological models and conceptual proposals.

I will present evidence to show that not only can the model duplicate the tuning properties of neurons in various brain areas when probed with artificial stimuli (like the ones typically used in physiology), but it can also handle the recognition of objects in the real-world, to the extent of competing with the best computer vision systems. Following this, I will present a comparison between the performance of the model and the performance of human observers in a rapid animal vs. non-animal recognition task for which recognition is fast and cortical back-projections are likely to be inactive. Results indicate that the model predicts human performance extremely well when the delay between the stimulus and the mask is about 50 ms. These results suggest that cortical back-projections may not play a significant role when the time interval is in this range, and the model may therefore provide a satisfactory description of the feedforward path.

Taken together, the evidence I will present shows that we may have the skeleton of a successful theory of visual cortex. In addition, this may be the first time that a neurobiological model, faithful to the physiology and the anatomy of visual cortex, not only competes with some of the best computer vision systems thus providing a realistic alternative to engineered artificial vision systems, but also achieves performance close to that of humans in a categorization task involving complex natural images.

No comments: