Check this out if you want to see what Google is thinking about for computer vision research.
Jay Yagnik,
Head of Computer Vision and Audio Understanding Research
Google, Inc.
When: Wednesday, September 17, 12:00 p.m.
Abstract:
In most recognition / retrieval problems dealing with large image/video datasets users are often looking for searching / browsing around semantics of the data. Standard computer vision algorithms deal with features extracted from pixels and attempt to perform a mapping to predict semantics. Approaches looking at just pixels are inherently limited in this regard and give rise to what we call the "semantic gap", i.e the disconnect between the semantic concepts natural to users and the pixel based predictions. One possible solution here is to rely on the large collection of public web pages where we have images and surrounding text that is potentially relevant to the inherent semantics of the image. I'll present a special case of this class of solutions around learning to recognize people. Named entity recognition style text parsing can give us hints from the text about what phrases might be people names. Retaining all the possible associations between names and faces would give us a very weak training set. I'll talk about a machine learning formulation that we refer to as consistency learning, that can effectively train models from such weak training sets and use them for robust recognition. The training procedure is inherently parallel and scales to really large sets. We verify this by large scale experiments with more than 86M face models involved for more than 200K people.
Bio: Jay Yagnik is Head of Computer Vision and Audio Understanding Research at Google Inc. His interests include machine learning, scalable matching, graph information propagation, image representation and recognition, temporal information mining, statistics. He is an alumni of the Indian Institute of Science and Nirma Institute of Technology. Prior to Google he worked on criminal identification through beard-mustache invariant facial recognition, machine learning for predicting protein function and more at the Super Education and Research Center at IISc Bangalore.
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.