Apprentissage d'agents multi-tâches sous une supervision minimale

Spécialité : Image, Vision et Robotique

3/07/2023 - 09:30 Lina Mezghani Inria Grenoble, Grand Amphi

The development of intelligent agents has seen significant progress in the last decade, showing impressive capabilities in various tasks, such as video games or robot navigation. These advances were made possible by the advent of the reinforcement learning (RL) paradigm, which models the interaction between an agent and its environment. However, in practice, the implementation of such agents requires significant human intervention and prior knowledge on the task at hand, which can be seen as forms of supervision. In this thesis, we tackle three different aspects of supervision in RL, and propose methods to reduce the amount of human intervention required to train agents.

We first investigate the impact of supervision on the choice of observations seen by the agent. In robot navigation for example, the modalities of the environment observed by the agent are an important design choice that can have a significant impact on the difficulty of the task. To tackle this question, we focus on image-goal navigation in photo-realistic environments, and propose a method for learning to navigate from raw visual inputs, i.e., without relying on depth or position information.

Second, we target the problem of reward supervision in RL. Standard RL algorithms rely on the availability of a well-shaped reward function to solve a specific task. However, the design of such functions is often a difficult and time-consuming process, which requires a clear understanding of the task and environment at hand. This limits the scalability and generalization capabilities of the designed methods. To address this issue, we tackled the problem of learning state-reaching policies without reward supervision, and design methods that leverage intrinsic reward functions to learn such policies.

Finally, we study the problem of learning agents offline, from pre-collected data, and question the availability of such data. Collecting expert trajectories is often a difficult and time-consuming process, which can often be more difficult than the downstream task itself. Offline algorithms should therefore rely on existing data, and we propose a method for learning goal-conditioned agents from tutorial videos, which contains expert demonstrations aligned with natural language captions.

Président:

- ()

Directeurs:

Karteek Alahari (Inria )
Piotr Bojanowski (Meta AI )

Raporteurs:

Olivier Sigaud (Sorbonne Univ. )
Abhinav Gupta (CMU )

Examinateurs:

Pierre-Yves Oudeyer (Inria )
Anne Spalanzani (UGA )