Statistical and computational perspectives on latent variable models


Séminaire Données et Aléatoire Théorie & Applications

29/09/2022 - 14:00 Nhat Ho (The University of Texas at Austin) Online

The growth in scope and complexity of modern data sets presents the field of statistics and data science with numerous inferential and computational challenges, among them how to deal with various forms of heterogeneity. Latent variable models provide a principled approach to modelling heterogeneous collections of data. However, due to the over-parameterization, it has been observed that parameter estimation and latent structures of these models have non-standard statistical and computational behaviours. In this talk, we provide new insights into these behaviours under mixture models, a building block of latent variable models. From the statistical viewpoint, we propose a general framework for studying the convergence rates of parameter estimation in mixture models based on Wasserstein distance. Our study makes explicit the links between model singularities, parameter estimation convergence rates, and the algebraic geometry of the parameter space for mixtures of continuous distributions. Based on the convergence rates of parameter estimation under the Wasserstein distance, we propose a novel Merge-Truncate Merge procedure to consistently estimate the true number of components in mixture models. From the computational side, we study the non-asymptotic behaviour of the Expectation-Maximization (EM) algorithm, which is a stable method, and some high-order and unstable algorithms, including Newton’s method, under the over-specified settings of mixture models in which the likelihood need not be strongly concave, or, equivalently, the Fisher information matrix might be singular. Focusing on the simple setting of a two-component mixture fit with equal mixture weights to a multivariate Gaussian distribution, we demonstrate that EM updates converge to a fixed point at Euclidean distance O((d/n)1/4) from the true parameter after O((n/d)1/2) steps where d is the dimension. On the other hand, Newton’s method can achieve the same statistical accuracy as the EM algorithm in exponentially fewer steps - namely, with the number of iterations being reduced from O((n/d)1/2) to O(log(n/d)). Our result shows that unstable algorithms are preferred to their stable counterparts under this simple setting of mixture models. As a consequence, it rules out the belief that the stability of the algorithm is a necessity for obtaining efficient statistical estimators.