Sub-sampling for multi-armed bandits.

Discussing articles

Akram Baransi, Odalric-Ambrym Maillard, Shie Mannor.
In European conference on Machine Learning, 2014.


The stochastic multi-armed bandit problem is a popular model of the exploration/exploitation trade-off in sequential decision problems. We introduce a novel algorithm that is based on sub-sampling. Despite its simplicity, we show that the algorithm demonstrates excellent empirical performances against state-of-the-art algorithms, including Thompson sampling and KL-UCB. The algorithm is very flexible, it does need to know a set of reward distributions in advance nor the range of the rewards. It is not restricted to Bernoulli distributions and is also invariant under rescaling of the rewards. We provide a detailed experimental study comparing the algorithm to the state of the art, the main intuition that explains the striking results, and conclude with a finite-time regret analysis for this algorithm in the simplified two-arm bandit setting.

You can dowload the paper from the ECML website (here) or from the HAL online open depository* (here).

title={Sub-sampling for Multi-armed Bandits},
author={Baransi, Akram and Maillard, Odalric-Ambrym and Mannor, Shie},
booktitle={Machine Learning and Knowledge Discovery in Databases},

Concentration inequalities for sampling without replacement.

2014, Discussing articles

Rémi Bardenet, Odalric-Ambrym Maillard.
In Bernoulli Journal, 2014.



Concentration inequalities quantify the deviation of a random variable from a fixed value. In spite of numerous applications, such as opinion surveys or ecological counting procedures, few concentration results are known for the setting of sampling without replacement from a finite population. Until now, the best general concentration inequality has been a Hoeffding inequality due to Serfling (1974). In this paper, we first improve on the fundamental result of Serfling (1974), and further extend it to obtain a Bernstein concentration bound for sampling without replacement. We then derive an empirical version of our bound that does not require the variance to be known to the user.

You can dowload the paper from the Bernoulli website (here) or from the HAL online open depository* (here).