Odalric-Ambrym Maillard, Gilles Stoltz, Rémi Munos.
In Proceedings of the 24th annual Conference On Learning Theory,
COLT 2011.
[Download]
Abstract: |
We consider a Kullback-Leibler-based algorithm for the stochastic multi-armed bandit problem in the case of distributions with finite supports (not necessarily known beforehand), whose asymptotic regret matches the lower bound of \cite{Burnetas96}. Our contribution is to provide a finite-time analysis of this algorithm; we get bounds whose main terms are smaller than the ones of previously known algorithms with finite-time analyses (like UCB-type algorithms). |
You can dowload the paper from the JMLR website (here) or from the HAL online open depository* (here).
Bibtex: |
@INPROCEEDINGS{MaiMunSto2011KL, author = {Odalric-Ambrym Maillard and R{\’e}mi Munos and Gilles Stoltz}, title = {Finite-Time Analysis of Multi-armed Bandits Problems with Kullback-Leibler Divergences}, booktitle = {Proceedings of the 24th annual Conference On Learning Theory}, year = {2011}, series = {COLT ’11}, keywords = {Bandit Theory}, location = {Budapest, Hungary}, numpages = {13} } |