Kullback-Leibler Upper Confidence Bounds for Optimal Sequential Allocation.

2013, Discussing articles

Olivier Cappé, Aurélien Garivier,
Odalric-Ambrym Maillard,
Rémi Munos, Gilles Stoltz.
In The Annals of Statistics, 2013.

[Download]

Abstract:

We consider optimal sequential allocation in the context of the so-called stochastic multi-armed bandit model. We describe a generic index policy, in the sense of  Gittins (1979), based on upper confidence bounds of the arm payoffs computed using the Kullback-Leibler divergence. We consider two classes of distributions for which instances of this general idea are analyzed: The kl-UCB algorithm is designed for one-parameter exponential families and the empirical KL-UCB algorithm for bounded and finitely supported distributions. Our main contribution is a unified finite-time analysis of the regret of these algorithms that asymptotically matches the lower bounds of Lai and Robbins (1985) and Burnetas et Katehakis (1996), respectively. We also investigate the behavior of these algorithms when used with general bounded rewards, showing in particular that they provide significant improvements over the state-of-the-art.

You can dowload the paper from the Annals of Statistics (here) or from the HAL online open depository* (here).

Bibtex:
@article{CaGaMaMuSt2013,
AUTHOR = {Olivier Capp\'{e}, Aur\'{e}lien Garivier, Odalric-Ambrym Maillard, R\'{e}mi Munos, Gilles Stoltz},
TITLE = {Kullback–Leibler upper confidence bounds for optimal sequential allocation},
JOURNAL = {Ann. Statist.},
FJOURNAL = {Annals of Statistics},
YEAR = {2013},
VOLUME = {41},
NUMBER = {3},
PAGES = {1516-1541},
ISSN = {0090-5364},
DOI = {10.1214/13-AOS1119} }
Related Publications:
Finite-Time Analysis of Multi-armed Bandits Problems with Kullback-Leibler Divergences.
Odalric-Ambrym Maillard, Gilles Stoltz, Rémi Munos.
In Proceedings of the 24th annual Conference On Learning Theory, COLT, 2011.
Advertisements

Finite-Time Analysis of Multi-armed Bandits Problems with Kullback-Leibler Divergences.

2011, Discussing articles

Odalric-Ambrym Maillard, Gilles Stoltz, Rémi Munos.
In Proceedings of the 24th annual Conference On Learning Theory,
COLT 2011.

[Download]

Abstract:

We consider a Kullback-Leibler-based algorithm for the stochastic multi-armed bandit problem in the case of distributions with finite supports (not necessarily known beforehand), whose asymptotic regret matches the lower bound of \cite{Burnetas96}. Our contribution is to provide a finite-time analysis of this algorithm; we get bounds whose main terms are smaller than the ones of previously known algorithms with finite-time analyses (like UCB-type algorithms).

You can dowload the paper from the JMLR website (here) or from the HAL online open depository* (here).

Bibtex:
@INPROCEEDINGS{MaiMunSto2011KL,
author = {Odalric-Ambrym Maillard and R{\’e}mi Munos and Gilles Stoltz},
title = {Finite-Time Analysis of Multi-armed Bandits Problems with Kullback-Leibler
Divergences},
booktitle = {Proceedings of the 24th annual Conference On Learning Theory},
year = {2011},
series = {COLT ’11},
keywords = {Bandit Theory},
location = {Budapest, Hungary},
numpages = {13}
}