“How hard is my MDP?” Distribution-norm to the rescue.

2014, Discussing articles

Odalric-Ambrym Maillard, Timothy A. Mann, Shie Mannor.
In advances in Neural Information Processing Systems, 2014.


In Reinforcement Learning (RL), state-of-the-art algorithms require a large number of samples per state-action pair to estimate the transition kernel p. In many problems, a good approximation of p is not needed. For instance, if from one state-action pair (s,a), one can only transit to states with the same value, learning p(|s,a) accurately is irrelevant (only its support matters). This paper aims at capturing such behavior by defining a novel hardness measure for Markov Decision Processes (MDPs) we call the distribution-norm. The distribution-norm w.r.t. a measure ν is defined on zero ν-mean functions f by the standard variation of f with respect to ν. We first provide a concentration inequality for the dual of the distribution-norm. This allows us to replace the generic but loose ||||1 concentration inequalities used in most previous analysis of RL algorithms, to benefit from this new hardness measure. We then show that several common RL benchmarks have low hardness when measured using the new norm. The distribution-norm captures finer properties than the number of states or the diameter and can be used to assess the difficulty of MDPs.

You can dowload the paper from the NIPS website (here) or from the HAL online open depository* (soon).

title = {How hard is my MDP?” The distribution-norm to the rescue”},
author = {Maillard, Odalric-Ambrym and Mann, Timothy A and Mannor, Shie},
booktitle = {Advances in Neural Information Processing Systems 27},
editor = {Z. Ghahramani and M. Welling and C. Cortes and N.D. Lawrence and K.Q. Weinberger},
pages = {1835–1843},
year = {2014},
publisher = {Curran Associates, Inc.},
url = {http://papers.nips.cc/paper/5441-how-hard-is-my-mdp-the-distribution-norm-to-the-rescue.pdf}

Selecting Near-Optimal Approximate State Representations in Reinforcement Learning.

2014, Discussing articles

Ronald Ortner, Odalric-Ambrym Maillard, Daniil Ryabko.
In Algorithmic Learning Theory, 2014.


We consider a reinforcement learning setting introduced in (Maillard et al., NIPS 2011) where the learner does not have explicit access to the states of the underlying Markov decision process (MDP). Instead, she has access to several models that map histories of past interactions to states. Here we improve over known regret bounds in this setting, and more importantly generalize to the case where the models given to the learner do not contain a true model resulting in an MDP representation but only approximations of it. We also give improved error bounds for state aggregation

You can dowload the paper from the  ALT Website (here) or from the HAL online open depository* (here).

  title={Selecting near-optimal approximate state representations in reinforcement learning},
  author={Ortner, Ronald and Maillard, Odalric-Ambrym and Ryabko, Daniil},
  booktitle={Algorithmic Learning Theory},
Related Publications:
Competing with an infinite set of models in reinforcement learning, Phuong Nguyen, Odalric-Ambrym Maillard, Daniil Ryabko, and Ronald Ortner, in Proceedings of the International Conference on Artificial Intelligence and Statistics (AI&STATS), volume 31 of JMLR W&CP , pages 463–471, Arizona, USA, 2013.

Optimal regret bounds for selecting the state representation in reinforcement learning.
Odalric-Ambrym Maillard, Phuong Nguyen, Ronald Ortner, Daniil Ryabko.
In Proceedings of the 30th international conference on machine learning, ICML 2013, 2013.

Selecting the State-Representation in Reinforcement Learning.
Odalric-Ambrym Maillard, Daniil Ryabko, Rémi Munos.
In Proceedings of the 24th conference on advances in Neural Information Processing Systems, NIPS ’11, pages 2627–2635, 2011.

Concentration inequalities for sampling without replacement.

2014, Discussing articles

Rémi Bardenet, Odalric-Ambrym Maillard.
In Bernoulli Journal, 2014.



Concentration inequalities quantify the deviation of a random variable from a fixed value. In spite of numerous applications, such as opinion surveys or ecological counting procedures, few concentration results are known for the setting of sampling without replacement from a finite population. Until now, the best general concentration inequality has been a Hoeffding inequality due to Serfling (1974). In this paper, we first improve on the fundamental result of Serfling (1974), and further extend it to obtain a Bernstein concentration bound for sampling without replacement. We then derive an empirical version of our bound that does not require the variance to be known to the user.

You can dowload the paper from the Bernoulli website (here) or from the HAL online open depository* (here).