Generative Modelling with Sliced Rank-Statistic f-Divergences

English

Séminaire Données et Aléatoire Théorie & Applications

18/12/2025 - 14:00 José Manuel De Frutos Porras Salle 106

Quantifying the difference between two probability distributions is a fundamental challenge in machine learning, particularly for training and evaluating generative models. In this talk, we present Rank-Statistic f-Divergences, a novel framework currently under development. 

Standard methods for estimating divergences (like the Kullback-Leibler or Total Variation distance) can be unstable or difficult to compute from finite samples. Our approach addresses this by approximating these divergences using rank statistics, specifically, by analysing the relative ordering of data points projected onto a basis of Bernstein polynomials. We discuss three key properties of this proposed estimator: 

1. Monotonicity: We introduce a "degree of approximation" parameter, K. We show that the approximation quality strictly improves as K increases, giving us control over the trade-off between precision and complexity.  

2. Robustness (Distribution-Free): Because our method relies on ranks rather than raw absolute values, it requires no assumptions about the underlying data shape and is naturally robust to extreme outliers.  

3. Scalability: By combining this with a "slicing" technique, projecting high-dimensional data onto 1-dimensional lines, we can effectively apply this method to complex, multivariate generative modeling tasks.  

We conclude by sharing preliminary findings that suggest this framework offers a promising, mathematically sound alternative for two-sample testing and model training.