The long-run behavior of first-order training algorithms in machine learning

Séminaire Données et Aléatoire Théorie & Applications

7/11/2024 - 14:00 Panayotis Mertikopoulos Salle 106

Much of the success of modern deep learning architectures and models is due to the algorithms used to train them. These methods are almost primitive in their simplicity - like stochastic gradient descent (SGD) and its variants - but, nonetheless, they manage to navigate an extremely complicated optimization landscape and often yield results that define the state of the art in many applications. In this talk, I will attempt to sketch a unified view of the long-run behavior of stochastic gradient algorithms in optimization. We will begin with the more familiar case of ordinary loss minimization problems, and we will analyze the asymptotic convergence and concentration properties of SGD - that is, when and where the method converges, which outcomes are more likely to be observed in the long run, and by how much. We will then venture in the murky waters of min-max optimization - the heart of adversarial machine learning and generative networks - and we will examine the possible limit points and attractors of min-max gradient methods in a zero-sum context. I will do my best to minimize technical prerequisites for this talk by relying on illustrations and intuition instead of rigor and generality.