Odalric-Ambrym Maillard.
In Algorithmic Learning Theory, 2013.
[Download]
Abstract: |
We study a variant of the standard stochastic multi-armed bandit problem when one is not interested in the arm with the best mean, but instead in the arm maximizing some coherent risk measure criterion. Further, we are studying the deviations of the regret instead of the less informative expected regret. We provide an algorithm, called RA-UCB to solve this problem, together with a high probability bound on its regret. |
You can dowload the paper from the ALT website (here) or from the HAL online open depository* (here).
Bibtex: |
@incollection{Maillard2013, year={2013}, isbn={978-3-642-40934-9}, booktitle={Algorithmic Learning Theory}, volume={8139}, series={Lecture Notes in Computer Science}, editor={Jain, Sanjay and Munos, Rémi and Stephan, Frank and Zeugmann, Thomas}, title={Robust Risk-Averse Stochastic Multi-armed Bandits}, publisher={Springer Berlin Heidelberg}, author={Maillard, Odalric-Ambrym}, pages={218-233} } |
Related Publications: |
Kullback-Leibler Upper Confidence Bounds for Optimal Sequential Allocation. Olivier Cappé, Aurelien Garivier, Odalric-Ambrym Maillard, Rémi Munos, Gilles Stoltz. In The Annals of Statistics, 2013. Finite-Time Analysis of Multi-armed Bandits Problems with Kullback-Leibler Divergences. |