Odalric-Ambrym Maillard, Rémi Munos.

*In Proceedings of the 14th international conference on Artificial Intelligence and Statistics,
AI&Statistics 2011, volume 15 of JMLR W&CP, 2011.
*

[Download]

Abstract: |

tight algorithms achieving a not tractable regret (at time T) bounded by Õ(\sqrt{TAC}), where C is the number of classes of theta. Now, when many models are available, all known algorithms achieving a nice regret O(\sqrt{T}) are unfortunately tractable and scale poorly with the number of models |Theta| . Our contribution here is to provide algorithms with regret bounded by T^{2/3}C^{1/3}log(|Theta|)^{1/2}. |

You can dowload the paper from the JMLR webiste (here) or from the HAL online open depository* (here).

Bibtex: |

@inproceedings{MaillardM11, author = {Odalric{-}Ambrym Maillard and R{\'{e}}mi Munos}, title = {Adaptive Bandits: Towards the best history-dependent strategy}, booktitle = {Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, {AISTATS} 2011, Fort Lauderdale, USA, April 11-13, 2011}, year = {2011}, pages = {570–578} editor = {Geoffrey J. Gordon and David B. Dunson and Miroslav Dud{\'{\i}}k}, series = {{JMLR} Proceedings}, year = {2011}, volume = {15} } |