Learning Image Classification and Retrieval Models


Speciality : Mathématiques et Informatique

26/10/2012 - 10:30 Mr Thomas Mensink (Université de Grenoble) Grand Amphi de l'INRIA Rhône-Alpes, Montbonnot

Keywords :
  • image retrieval
  • structure prediction
  • zero-shot learning
  • interactive label production
  • metric learning
  • large-scale classification
We are currently experiencing an exceptional growth of visual data, for example, millions of photos are shared daily on social-networks. Image understanding methods aim to facilitate access to this visual data in a semantically meaningful manner. In this dissertation, we define several detailed goals which are of interest for the image understanding tasks of image classification and retrieval, which we address in three main chapters.

First, we aim to exploit the multi-modal nature of many databases, wherein documents consists of images with a form of textual description. In order to do so we define similarities between the visual content of one document and the textual description of another document. These similarities are computed in two steps, first we find the visually similar neighbors in the multi-modal database, and then use the textual descriptions of these neighbors to define a similarity to the textual description of any document.

Second, we introduce a series of structured image classification models, which explicitly encode pairwise label interactions. These models are more expressive than independent label predictors, and lead to more accurate predictions. Especially in an interactive prediction scenario where a user provides the value of some of the image labels. Such an interactive scenario offers an interesting trade-off between accuracy and manual labeling effort. We explore structured models for multi-label image classification, for attribute-based image classification, and for optimizing for specific ranking measures.

Finally, we explore k-nearest neighbors and nearest-class mean classifiers for large-scale image classification. We propose efficient metric learning methods to improve classification performance, and use these methods to learn on a data set of more than one million training images from one thousand classes. Since both classification methods allow for the incorporation of classes not seen during training at near-zero cost, we study their generalization performances. We show that the nearest-class mean classification method can generalize from one thousand to ten thousand classes at negligible cost, and still perform competitively with the state-of-the-art.


Mr Frédéric Jurie (Professeur - )


  • Mme Cordelia Schmid (Directeur de Recherche - INRIA )
  • Mme Gabriela Csurka
  • Mr Jakob Verbeek (Chargé de Recherche - INRIA )


  • Mme Barbara Caputo
  • Mr Christophe Lampert