Challenges raised by Missing Not At Random Data


Seminar Données et Aléatoire Théorie & Applications

21/03/2024 - 14:00 Aude Sportisse Salle 106

One of the ironies of increased data collection is that missing data are inevitable: the more data there are, the more missing data there are. The purpose of this
presentation is to provide an overview of data Missing Not At Random (MNAR), which is when the unavailability of the data depends on the values taken by the data. It implies that the observed population is not representative of the general one. These missing data are widely encountered in real data sets, but they introduce significant biases into the samples, which most existing methods ignore. We will discuss the main difficulties raised by MNAR data, as the identifiability of the parameters, and we will see some examples illustrating how to deal with them in some specific contexts: semi-supervised learning, clustering and low-rank models.