A transductive bound of the voted-classifier : an application to image annotation and multi-lingual document classification

English

Séminaire Probabilités & Statistique

18/09/2014 - 14:00 Massih-Reza AMINI (Université Grenoble I / LIG / AMA) Salle 1 - Tour IRMA

In semi-supervised learning, the margin considered as an indicator of confidence constitutes the working hypothesis of discriminant algorithms that search the decision boundary in low-density regions. Following this assumption, we propose to bound the error probability of the voted classifier on the examples for whose margins are above a fixed threshold. This bound serves to tune the threshold hyperparameter of the self-training algorithm and as an application, we propose a multiview self-learning strategy which trains different voting classifiers on different views. The margin distributions over the unlabeled training data, obtained with each view-specific classifier are then used to estimate an upper-bound on their transductive Bayes error. Minimizing this upper-bound provides an automatic margin-threshold which is used to assign pseudo-labels to unlabeled examples. Final class labels are then assigned to these examples, by taking a vote on the pool of the previous pseudo-labels. New view-specific classifiers are then trained using the original labeled and the pseudo-labeled training data. We consider applications to image-text and to multilingual document classification and present experimental results on the NUS-WIDE collection and on the multi-lingual Reuters RCV1-RCV2 dataset.