Jiayong Zhang, Robotics Institute
Carnegie Mellon University
An articulated object can be loosely defined as a structure or mechanical system composed of links and joints. The human body is a good example of a nonrigid, articulated object. Localizing body shapes in still images remains a fundamental problem in computer vision, with potential applications in surveillance, video editing/annotation, human computer interfaces, and entertainment.
In this thesis, we present a 2D model-based approach to human body localization. We first consider a fixed viewpoint scenario (side-view) by introducing a triangulated model of the nonrigid and articulated body contours. Four types of image cues are combined to relate the model configuration to the observed image, including edge gradient, silhouette, skin color, and region similarity. The model is arranged into a sequential structure, enabling simple yet effective spatial inference through Sequential Monte Carlo (SMC) sampling.
We then extend the system to situations where the viewpoint of the human target is unknown. To accommodate large viewpoint changes, a mixture of view-dependent models is employed. Each model is decomposed based on the concept of parts, with anthropometric constraints and self-occlusion explicitly treated. Inference is done by direct sampling of the posterior mixture, using SMC enhanced with annealing. The fitting method is independent of the number of mixture components, and does not require the preselection of a “correct” viewpoint.
Finally, we return to the generic setting of single image, arbitrary pose, and arbitrary viewpoint. The constraints on the body pose and background subtraction that have been used in previous systems are no longer required. Our proposed solution is a hybrid search facilitated by a 3-level hierarchical decomposition of the model. We first fit a simple tree-structured model defined on a compact landmark set along the body contours by Dynamic Programming (DP). The output is a series of proposal maps that encode the probabilities of partial body configurations. Next, we fit a mixture of view-dependent models by SMC, which handles self-occlusion, anthropometric constraints, and large viewpoint changes. DP and SMC are designed to search in opposite directions such that the DP proposals are utilized effectively to initialize and guide the SMC inference. This hybrid strategy of combining deterministic and stochastic search ensures both the robustness and efficiency of DP, and the accuracy of SMC. Finally, we fit an expanded mixture model with increased landmark density through local optimization.
The models were trained on a large number of gait images. Extensive tests on cluttered images with varying poses including walking, dancing and various types of sports activities justified the feasibility of the proposed approach.
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.