A Complex Wavelet Approach for Shift-Invariant Convolutional Neural Networks

Speciality : Mathématiques et Informatique

14/06/2023 - 14:00 Hubert Leterme (Université Grenoble Alpes) IMAG, Salles de séminaire 1 & 2 (ground floor)

Keywords :

computer vision
image processing
wavelets

Despite significant advancements in computer vision over the past decade, convolutional neural networks (CNNs) still suffer from a lack of mathematical understanding. In particular, stability properties with respect to small transformations such as translations, rotations, scaling or deformations are only partially understood. While there is a broad literature on this topic, some gaps remain, specifically with regard to the combined effect of convolution and max pooling layers in producing near shift-invariant feature representations. This property is of utmost importance for classification, since two shifted versions of a single input image are expected to receive the same label.

It is well-known that subsampled convolutions with band-pass filters are prone to producing unstable image representations when inputs are shifted by a few pixels. The first contribution of this thesis consists in proving that a nonlinear max pooling operator can partially restore shift invariance. By applying results from the wavelet theory, and adopting a probabilistic point of view, we reveal a similarity between the max pooling of real-valued convolutions, as implemented in conventional architectures, and the modulus of complex-valued convolutions, for which a measure of shift invariance is established.

However, for specific filter frequencies, this similarity is lost, and CNNs become unstable to translations. This phenomenon, known as aliasing, can be avoided by employing additional low-pass filters in strategic locations of the network architecture, as several authors have done in recent years. While their methods effectively increase both shift invariance and prediction accuracy, they come at the cost of significant loss of high-frequency information. As a second contribution, we present a novel antialiasing method which, unlike previous methods, preserves this information. Relying on our theoretical study, the key idea is to exploit the properties of complex convolutions to guarantee near-shift invariance for any filter frequency. By adding an imaginary part to high-frequency kernels and replacing the max pooling layer with a simple modulus operator, we empirically evidence an increase in the network's stability and a lower error rate compared to previous approaches based on low-pass filtering.

In conclusion, the aim of this thesis is twofold: improving the mathematical understanding of CNNs from the perspective of shift invariance, and improving the tradeoff between stability and information preserving, based on our theoretical contribution which is grounded in wavelet theory. Our findings thus have the potential to positively impact various applications of computer vision, especially in fields that require theoretical guarantees.

President:

Prof. Massih-Reza Amini (Université Grenoble Alpes)

Directors:

Prof. Valérie Perrier (Grenoble INP - UGA )
Dr. Karteek Alahari (Inria, Université Grenoble Alpes )
Dr. Kévin Polisano (CNRS, Université Grenoble Alpes )

Raporteurs:

Prof. François Malgouyres (Université Toulouse III - Paul Sabatier )
Dr. Nelly Pustelnik (CNRS, ENS de Lyon )

Examinators:

Prof. Joan Bruna (New York University )
Dr. Gabriel Peyré (CNRS, ENS Paris )