Challenges raised by Missing Not At Random Data
Seminar Données et Aléatoire Théorie & Applications
21/03/2024 - 14:00 Aude Sportisse Salle 106
One of the ironies of increased data collection is that missing data are inevitable: the more data there are, the more missing data there are. The purpose of this presentation is to provide an overview of data Missing Not At Random (MNAR), which is when the unavailability of the data depends on the values taken by the data. It implies that the observed population is not representative of the general one. These missing data are widely encountered in real data sets, but they introduce significant biases into the samples, which most existing methods ignore. We will discuss the main difficulties raised by MNAR data, as the identifiability of the parameters, and we will see some examples illustrating how to deal with them in some specific contexts: semi-supervised learning, clustering and low-rank models.