Thèse de DOCTORAT
Spécialité: Mathématiques Appliquées
Mr Federico PIERUCCI
soutiendra le Vendredi 23 Juin 2017 à 14h30 Grand Amphi de l'INRIA RhôneAlpes, Montbonnot
Titre:Nonsmooth Optimization for Statistical Learning with Structured Matrix Regularization
Ces travaux se sont déroulés sous la direction de Mr Anatoli IOUDITSKI (Université Grenoble Alpes), Mr Jérôme MALICK (CNRS) et de Mr Zaid HARCHAOUI (University of Washington)
Résumé:
Training machine learning methods boils down to solving optimization problems whose objective
functions often decomposes into two parts: a) the empirical risk, built upon the loss function, whose
shape is determined by the performance metric and the noise assumptions; b) the regularization
penalty, built upon a norm, or a gauge function, whose structure is determined by the prior information
available for the problem at hand.
Common loss functions, such as the hinge loss for binary classification, or more advanced loss
functions, such as the one arising in classification with reject option, are nonsmooth. Sparse
regularization penalties such as the (vector) L1penalty, or the (matrix) nuclearnorm penalty, are also
nonsmooth. The goal of this thesis is to study doubly nonsmooth learning problems (with nonsmooth
loss functions and nonsmooth regularization penalties) and firstorder optimization algorithms that
leverage the composite structure of nonsmooth objectives.
In the first chapter, we introduce new regularization penalties, called the group Schatten norms, to
generalize the standard Schatten norms to blockstructured matrices. We establish the main
properties of the group Schatten norms using tools from convex analysis and linear algebra; we
retrieve in particular some convex envelope properties. We discuss several potential applications of
the group nuclearnorm, in collaborative filtering, database compression, multilabel image tagging.
In the second chapter, we present a survey of smoothing techniques that allow us to use firstorder
optimization algorithms originally designed for learning problems with nonsmooth loss. We also show
how smoothing can be used on the loss function corresponding to the topk accuracy, used for ranking
and multiclass classification problems. We outline some firstorder algorithms that can be used in
combination with the smoothing technique: i) conditional gradient algorithms; ii) proximal gradient
algorithms; iii) incremental gradient algorithms.
In the third chapter, we study further conditional gradient algorithms for solving doubly nonsmooth
optimization problems. We show that an adaptive smoothing combined with the standard conditional
gradient algorithm gives birth to new conditional gradient algorithms having the expected theoretical
convergence guarantees. We present promising experimental results in collaborative filtering for movie
recommendation and image categorization.
MotsClés:
fi rstorder optimization, conditional gradient, smoothing, nuclearnorm, machine learning, mathematical optimization
Membres du Jury:
Mr MassihReza AMINI (Université Grenoble Alpes), proposé président.
Rapporteurs:
Mr Alexander NAZIN (Institute of Control Sciences RAS, Moscow, Russia)
Mr Stephane CHRETIEN (National Physical Laboratory, Teddington, Middlesex, UK)
Examinateurs:
Mme Nelly PUSTELNIK (CNRS, ENS Lyon)
Mr Joseph SALMON (TELECOM ParisTech, Paris, France,)
