On the evaluation and generalization of visual representations

français

Speciality : Image, Vision et Robotique

29/06/2023 - 09:00 Mert Bulent Sariyildiz Grand Amphi, Inria

Computers are getting better and better at recognizing concepts within data such as textual corpus, video sequences or image collections, as long as they have been specifically trained to recognize these concepts. In particular, deep learning methods have achieved impressive performances in a large variety of classification tasks, even outperforming human-level recognition on the classification of visual objects categories in images. However, this super-human performance is mostly due to the fact that machines easily acquire expert-level knowledge, provided large-scale domain-specific annotations. When it comes to applying previously acquired knowledge from one specific task to an even fairly related one, machines are showing far inferior capabilities then the human visual system, failing to generalize to unseen tasks and domains. To tackle this, some approaches have been proposed to transfer a model from one task to make it applicable to another one. What if the target task is not known in advance or if we plan to adapt an existing model to a large number of fairly orthogonal tasks? The goal of this PhD project is to build good image representations. An image representation is obtained by a function that transforms an image into a representation vector. A good image representation is supported by a good representation function. Intuitively, a good representation function should be a reliable starting point to tackle a large variety of computer vision tasks, from the most common (e.g. classification, detection), to the most challenging ones (e.g. dense geometry prediction, visual scene reasoning), it should work equally well or at least easily transfer to new domains. Additionally it should be efficient to compute and produce compact representations. In the first research direction, we would like to reflect on how one could define a good image representation, in a principle way. In other words: how could we quantitatively measure the quality of an image representation. There are at least three dimensions to explore to evaluate the quality of a representation: i) the amount of training data required for learning the representation function, ii) the capacity of the model required to adapt to a new task, and ii) the robustness of the representation to domain shift. In this research direction, we would like to define a set of tasks that cover the spectrum of data types and tasks that could constitute a reasonable benchmark for comparing the quality of image representations. In a second research direction, we would like to design approaches to train good image representations. One solution would be to group data from different natures, annotated in a heterogeneous manner, and collected to tackle potentially very different tasks and leverage them in a unified training process. In this line of research, we would like to explore algorithms suitable to train with these very different sources of data, potentially tackling different learning stages in a sequential manner. One potential way to sequence the order of the tasks is to focus on curriculum learning which weights the difficulty of a sample or a task.

Directors:

  • Karteek Alahari (Inria )
  • Diane Larlus (NaverLabs Europe )
  • Yannis Kalantidis (NaverLabs Europe )

Raporteurs:

  • Yannis Avrithis (Institute of Advanced Research on Artificial Intelligence )
  • Matthieu Cord (Sorbonne Université )

Examinators:

  • Jocelyn Chanussot (Grenoble INP )
  • Thomas Mensink (Google )
  • Cordelia Schmid (Inria )