6/11/2019 - 10:30 Mr Arthur Mensch (ENS) Salle 2 - RDC - Batiment IMAG
To facilitate training with gradients, supervised learning methods often transform selecting a single element within a set of outputs to predicting a probability distribution over this set (using e.g. the softmax operator). In this talk, we will understand this transformation as a functional smoothing of the output selection mechanism. Engineering this Nesterov smoothing yields new modelling perspective. First, we will observe that selecting an output within a combinatorial set (e.g. a sequence of tags) is often solved using dynamic programming algorithms. Smoothing turn DP algorithms into differentiable operators, that may predict potentially sparse probabilities over the output set. Secondly, we will design a smoothing that takes into account a cost function defined on the output set. This approach transforms the softmax operator into a cost-informed geometric softmax, that has the further capabilities of predicting distributions over a continuous set.