26/02/2015 - 14:00 Karteek Alahari (Inria Grenoble, LJK / Lear) Salle 1 - Tour IRMA
One of the goals of computer vision is to interpret a scene semantically from an image or a video. This problem has manifested itself in various forms, including, but not limited to, object recognition, 3D scene recovery, image segmentation, human pose estimation. Although tremendous progress has been made in all these tasks, many challenges still remain, especially in the context of large-scale data. In this talk I will discuss some of our attempts to address this, starting with our energy based formulation for reasoning about regions, objects, and their attributes such as object class, location, and spatial extent. We define a global energy function, which combines higher order functions defined on object regions with low-level pixel-based unary and pairwise relations. I will then describe methods for solving the inference and parameter learning problems, resulting from the optimization of the energy function, efficiently. The latter part of the talk will present a few more applications of energy based models for problems such as articulated human pose estimation in videos, scene text recognition.