Sunday, September 17, 2006

CMU proposal: Volumetric Descriptors for Efficient Video Analysis

Yan Ke

When: Wednesday, September 13, 09:30 a.m.
Where: 3305 Newell-Simon Hall

Abstract: The amount of digital video has grown exponentially in recent years. However, the technology for making intelligent searches on video has failed to keep pace. The question of how to efficiently represent video, optimized for retrieval, is still an open question. We make the key observation that objects in video span both space and time, and therefore 3D spatio-temporal volumetric features are natural representations forthem. The goal of this thesis to propose efficient volumetric representations for video and evaluate how well these representations perform in a wide range of applications. Example applications include video retrieval and action recognition. Our approach is divided into three main parts: spatio-temporal region extraction, volumetric region representations, and matching/recognition methods in video. We first use unsupervised clustering to extract an over-segmentation of the video volume. The regions loosely correspond to object boundaries in space-time. Next, we construct a volumetric representation for the regions and define a distance metric to match them. Finally, we learn models based on multiple templates of user-specified actions, such as tennis serves, running, dance moves, etc. We plan to evaluate the proposed method and compare against existing methods on a large video database.

Thesis Summary: the link.

No comments: