Momentum Smooths the Path to Gradient Equilibrium
Séminaire Données et Aléatoire Théorie & Applications
26/02/2026 - 14:00 Margaux Zaffran (Laboratoire de Mathématiques d'Orsay, Inria) Salle 106
Online gradient descent has recently been shown to satisfy gradient equilibrium for a broad class of loss functions, including quantile loss and squared loss. This means that the average of the gradients of the losses along the sequence of estimates converges to zero, a property that allows for quantile calibration and debiasing of predictions, among other useful properties of statistical flavor. A shortcoming of online gradient descent when optimized for gradient equilibrium is that the sequence of estimates is jagged, leading to volatile paths. In this work, we propose generalized momentum method, in the form of weighting of past gradients, as a broader algorithmic class with guarantees to smoothly postprocess (e.g., calibrate or debias) predictions from black-box algorithms, yielding estimates that are more meaningful in practice. We prove it achieves gradient equilibrium at the same convergence rates and under similar sets of assumptions as plain online gradient descent, all the while producing smoother paths that preserve the original signal amplitude. Of particular importance are the consequences for sequential decision-making, where more stable paths translate to less variability in statistical applications. These theoretical insights are corroborated by real-data experiments, showcasing the benefits of adding momentum.