Sub-sampling for multi-armed bandits.

Discussing articles

Akram Baransi, Odalric-Ambrym Maillard, Shie Mannor.
In European conference on Machine Learning, 2014.

[Download]

Abstract:
The stochastic multi-armed bandit problem is a popular model of the exploration/exploitation trade-off in sequential decision problems. We introduce a novel algorithm that is based on sub-sampling. Despite its simplicity, we show that the algorithm demonstrates excellent empirical performances against state-of-the-art algorithms, including Thompson sampling and KL-UCB. The algorithm is very flexible, it does need to know a set of reward distributions in advance nor the range of the rewards. It is not restricted to Bernoulli distributions and is also invariant under rescaling of the rewards. We provide a detailed experimental study comparing the algorithm to the state of the art, the main intuition that explains the striking results, and conclude with a finite-time regret analysis for this algorithm in the simplified two-arm bandit setting.

You can dowload the paper from the ECML website (here) or from the HAL online open depository* (here).

Bibtex:
@incollection{baransi2014sub,
title={Sub-sampling for Multi-armed Bandits},
author={Baransi, Akram and Maillard, Odalric-Ambrym and Mannor, Shie},
booktitle={Machine Learning and Knowledge Discovery in Databases},
pages={115–131},
year={2014},
publisher={Springer}
}

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s