Finite-Time Analysis of Multi-armed Bandits Problems with Kullback-Leibler Divergences.

2011, Discussing articles

Odalric-Ambrym Maillard, Gilles Stoltz, Rémi Munos.
In Proceedings of the 24th annual Conference On Learning Theory,
COLT 2011.

[Download]

Abstract:

We consider a Kullback-Leibler-based algorithm for the stochastic multi-armed bandit problem in the case of distributions with finite supports (not necessarily known beforehand), whose asymptotic regret matches the lower bound of \cite{Burnetas96}. Our contribution is to provide a finite-time analysis of this algorithm; we get bounds whose main terms are smaller than the ones of previously known algorithms with finite-time analyses (like UCB-type algorithms).

You can dowload the paper from the JMLR website (here) or from the HAL online open depository* (here).

Bibtex:
@INPROCEEDINGS{MaiMunSto2011KL,
author = {Odalric-Ambrym Maillard and R{\’e}mi Munos and Gilles Stoltz},
title = {Finite-Time Analysis of Multi-armed Bandits Problems with Kullback-Leibler
Divergences},
booktitle = {Proceedings of the 24th annual Conference On Learning Theory},
year = {2011},
series = {COLT ’11},
keywords = {Bandit Theory},
location = {Budapest, Hungary},
numpages = {13}
}

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s