“How hard is my MDP?” Distribution-norm to the rescue.

2014, Discussing articles

Odalric-Ambrym Maillard, Timothy A. Mann, Shie Mannor.
In advances in Neural Information Processing Systems, 2014.

[Download]

Abstract:
In Reinforcement Learning (RL), state-of-the-art algorithms require a large number of samples per state-action pair to estimate the transition kernel p. In many problems, a good approximation of p is not needed. For instance, if from one state-action pair (s,a), one can only transit to states with the same value, learning p(|s,a) accurately is irrelevant (only its support matters). This paper aims at capturing such behavior by defining a novel hardness measure for Markov Decision Processes (MDPs) we call the distribution-norm. The distribution-norm w.r.t. a measure ν is defined on zero ν-mean functions f by the standard variation of f with respect to ν. We first provide a concentration inequality for the dual of the distribution-norm. This allows us to replace the generic but loose ||||1 concentration inequalities used in most previous analysis of RL algorithms, to benefit from this new hardness measure. We then show that several common RL benchmarks have low hardness when measured using the new norm. The distribution-norm captures finer properties than the number of states or the diameter and can be used to assess the difficulty of MDPs.

You can dowload the paper from the NIPS website (here) or from the HAL online open depository* (soon).

Bibtex:
@incollection{MaiManMan14,
title = {How hard is my MDP?” The distribution-norm to the rescue”},
author = {Maillard, Odalric-Ambrym and Mann, Timothy A and Mannor, Shie},
booktitle = {Advances in Neural Information Processing Systems 27},
editor = {Z. Ghahramani and M. Welling and C. Cortes and N.D. Lawrence and K.Q. Weinberger},
pages = {1835–1843},
year = {2014},
publisher = {Curran Associates, Inc.},
url = {http://papers.nips.cc/paper/5441-how-hard-is-my-mdp-the-distribution-norm-to-the-rescue.pdf}
}
Advertisements

Sub-sampling for multi-armed bandits.

Discussing articles

Akram Baransi, Odalric-Ambrym Maillard, Shie Mannor.
In European conference on Machine Learning, 2014.

[Download]

Abstract:
The stochastic multi-armed bandit problem is a popular model of the exploration/exploitation trade-off in sequential decision problems. We introduce a novel algorithm that is based on sub-sampling. Despite its simplicity, we show that the algorithm demonstrates excellent empirical performances against state-of-the-art algorithms, including Thompson sampling and KL-UCB. The algorithm is very flexible, it does need to know a set of reward distributions in advance nor the range of the rewards. It is not restricted to Bernoulli distributions and is also invariant under rescaling of the rewards. We provide a detailed experimental study comparing the algorithm to the state of the art, the main intuition that explains the striking results, and conclude with a finite-time regret analysis for this algorithm in the simplified two-arm bandit setting.

You can dowload the paper from the ECML website (here) or from the HAL online open depository* (here).

Bibtex:
@incollection{baransi2014sub,
title={Sub-sampling for Multi-armed Bandits},
author={Baransi, Akram and Maillard, Odalric-Ambrym and Mannor, Shie},
booktitle={Machine Learning and Knowledge Discovery in Databases},
pages={115–131},
year={2014},
publisher={Springer}
}

Concentration inequalities for sampling without replacement.

2014, Discussing articles

Rémi Bardenet, Odalric-Ambrym Maillard.
In Bernoulli Journal, 2014.

[Download]

Abstract:

Concentration inequalities quantify the deviation of a random variable from a fixed value. In spite of numerous applications, such as opinion surveys or ecological counting procedures, few concentration results are known for the setting of sampling without replacement from a finite population. Until now, the best general concentration inequality has been a Hoeffding inequality due to Serfling (1974). In this paper, we first improve on the fundamental result of Serfling (1974), and further extend it to obtain a Bernstein concentration bound for sampling without replacement. We then derive an empirical version of our bound that does not require the variance to be known to the user.

You can dowload the paper from the Bernoulli website (here) or from the HAL online open depository* (here).

Bibtex:
(soon)