Latent Bandits.

2013, Discussing articles

Odalric-Ambrym Maillard, Shie Mannor
In International Conference on Machine Learning, 2014.

[Download]

Abstract:

We consider a multi-armed bandit problem where the reward distributions are indexed by two sets –one for arms, one for type– and can be partitioned into a small number of clusters according to the type. First,we consider the setting where all reward distributions are known and all types have the same underlying cluster, the type’s identity is, however, unknown. Second, we study the case where types may come from different classes, which is significantly more challenging. Finally, we tackle the case where the reward distributions are completely unknown. In each setting, we introduce specific algorithms and derive non-trivial regret performance. Numerical experiments show that, in the most challenging agnostic case, the proposed algorithm achieves excellent performance in several difficult scenarios.

You can dowload the paper from the JMLR website (here) or from the HAL online open depository* (here).

You can download the Java code used to generate the experiments here.

Bibtex:
@inproceedings{maillard2014latent,
  title={Latent Bandits.},
  author={Maillard, Odalric-Ambrym and Mannor, Shie},
  booktitle={Proceedings of The 31st International Conference on Machine Learning},
  pages={136--144},
  year={2014}
}

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s