Wednesday, July 29, 2009

Phd oral exam : Large Scale Scene Matching for Graphics and Vision

July 29, 2009
James H. Hays
11:00 AM, 7220 Wean Hall
Thesis Oral
Title: Large Scale Scene Matching for Graphics and Vision

Abstract:
Our visual experience is extraordinarily varied and complex. Thediversity of the visual world makes it difficult for computervision to understand images and for computer graphics tosynthesize visual content. But for all its richness, it turns outthat the space of "scenes" might not be astronomically large.With access to imagery on an Internet scale, regularities start toemerge - for most images, there exist numerous examples ofsemantically and structurally similar scenes. Is it possible tosample the space of scenes so densely that one can use similarscenes to "brute force" otherwise difficult image understandingand manipulation tasks? This thesis is focused on exploiting andrefining large scale scene matching to short circuit thetypical computer vision and graphics pipelines for imageunderstanding and manipulation.

First, in "Scene Completion" we patch up holes in images bycopying content from matching scenes. We find scenes so similarthat the manipulations are undetectable to naive viewers and wequantify our success rate with a perceptual study. Second, in"im2gps" we estimate geographic properties and globalgeolocation for photos using scene matching with a database of 6million geo-tagged Internet images. We geolocate sequences ofphotos four times as accurately as the single image case bymodelling the global spatiotemporal statistics of photographers.We introduce a range of features for scene matching and use them,together with lazy SVM learning, to dramatically improve scenematching -- doubling the performance of single image geolocationover our baseline method. Third, we study human photo geolocationto gain insights into the geolocation problem, our algorithms, andhuman scene understanding. This study shows that our algorithmssignificantly exceed human geolocation performance. Finally, weuse our geography estimates, as well as Internet text annotations,to provide context for deeper image understanding, such as objectdetection.

Thesis Committee:Alexei A. Efros, ChairMartial HebertJessica K. HodginsTakeo KanadeRichard Szeliski, Microsoft Research

No comments: