Sparse and discriminative clustering for high-dimensional data

English

Séminaire Probabilités & Statistique

6/12/2012 - 14:00 Charles Bouveyron (Laboratoire SAMM, Université Paris 1) Salle 1 - Tour IRMA

The Fisher-EM algorithm has been recently proposed for the simultaneous visualization and clustering of high-dimensional data. It is based on a mixture model which fits the data into a latent discriminative subspace with a low intrinsic dimension. From a practical point of view, the Fisher-EM algorithm turns out to outperform other subspace clustering in most situations. The convergence of the Fisher-EM algorithm is as well studied. It is in particular proved that the algorithm converges under weak conditions in the general case. It is also shown that the Fisher's criterion can be used as stopping criterion for the algorithm to improve the clustering accuracy and that the Fisher-EM algorithm usually converges faster than both the EM and CEM algorithms. Finally, a sparse extension of the Fisher-EM algorithm is proposed by adding a L1 constraint in the F step. This allows in particular to perform a selection of the original variables which are discriminative.