Stochastic Bandit Optimization Theoretical vs (more) Realistic worlds

Seminar Probabilités & Statistique

1/10/2015 - 14:00 Mr Vianney PERCHET (Université Denis Diderot) Salle 1 - Tour IRMA

The talk will be divided into two main part.
In the first one, I will recall the main classical and theoretical setup of "multi-armed bandits", a traditional problem in online learning. I will also provide and give intuitions on the celebrated algorithm UCB (for Upper Confidence Bound) but also a recent simpler, more intuitive (and with better guarantees) algorithm named ETC (Explore Then Commit). 

In the second part of the talk, I will use small variants of ETC and/or UCB in more realistic frameworks, for instance with time constraints (when it is not allowed to change actions erratically and unpredictably), with some priors on rewards or in repeated auctions.