Thèse de DOCTORAT de l'Institut National Polytechnique
Spécialité: Mathématiques et Informatique
Mr Daniel WEINLAND (INPG)
soutiendra le Lundi 20 Octobre 2008 à 14h00 Grand Amphi de l'INRIA Rhône-Alpes, Montbonnot
Action Representation and Recognition
Ces travaux se sont déroulés sous la direction de Mr Radu HORAUD (DR, INRIA Rhône-Alpes)
Recognizing human actions is an important and challenging topic in computer vision, with many important applications including video surveillance, video indexing and understanding of social interaction. From a computational perspective, actions can be defined as four-dimensional patterns, in space and in time. Such patterns can be modeled using several
representations which differ from each other with respect to, among others, the visual information used, e.g. shape or appearance, the representation of dynamics, e.g. implicit or explicit, and the amount of invariance that the representation exhibits, e.g. a viewpoint invariance allowing to learn and recognize using different camera configurations.
Our goal in this thesis is to develop a set of new techniques for action recognition. In the first part we present "Motion History Volumes", a
free-viewpoint representation for human actions based on 3D visual-hull reconstructions computed form multiple calibrated, and background
subtracted, video cameras. Results indicate that this representation can be used to learn and recognize basic human action classes, independently
of gender, body size and viewpoint.
We then present in the second part an approach based on a 3D exemplar-based HMM, which addresses the problem of recognizing actions from arbitrary views, even from a single camera. We will thus no longer require a 3D reconstruction during the recognition phase, instead we will use learned 3D models to produce 2D image information, which is compared to the observations.
In the third and last part, we present a compact and efficient exemplar-based representation, which in particular does not attempt to encode the dynamics of an action through temporal dependencies. In experimental results we demonstrate that such a representation can preciselyrecognize actions, even with cluttered and non-background-segmentedsequences.