Finite-Sample Analysis of Bellman Residual Minimization

2010, Discussing articles

Odalric-Ambrym Maillard, Rémi Munos, Alessandro Lazaric,
Mohammad Ghavamzadeh.
In ACML 2010, volume 13 of JMLR W&CP, pages 299-314, 2010.

[Download]

Abstract:

We consider the Bellman residual minimization approach for solving discounted Markov decision problems, where we assume that a generative model of the dynamics and rewards is available. At each policy iteration step, an approximation of the value function for the current policy is obtained by minimizing an empirical Bellman residual dened on a set of n states drawn i.i.d. from a distribution \mu, the immediate rewards, and the next states sampled from the model. Our main result is a generalization bound for the Bellman residual in linear approximation spaces. In particular, we prove that the empirical Bellman residual approaches the true (quadratic) Bellman residual in \mu-norm with a rate of order O(1/\sqrt{n}). This result implies that minimizing the empirical residual is indeed a sound approach for the minimization of the true Bellman residual which guarantees a good approximation of the value function for each policy. Finally, we derive performance bounds for the resulting approximate policy iteration algorithm in terms of the number of samples n and a measure of how well the function space is able to approximate the sequence of value functions.

You can dowload the paper from the JMLR webiste (here) or from the HAL online open depository* (here).

Bibtex:
@inproceedings{MaillardMLG10,
author = {O. A. Maillard and R. Munos and A. Lazaric and M. Ghavamzadeh},
title = {Finite Sample Analysis of {B}ellman Residual Minimization},
booktitle = {Asian Conference on Machine Learpning. JMLR: Workshop and Conference Proceedings},
volume = 13,
editor = {Masashi Sugiyama and Qiang Yang},
pages = {309-324},
year = {2010}
}

Scrambled Objects for Least-Squares Regression.

2010, Discussing articles

Odalric-Ambrym Maillard, Rémi Munos.
In NIPS ’10, pages 1549–1557, 2010.

[Download]

Abstract:

We consider least-squares regression using a randomly generated subspace G_P \subset F of finite dimension P,  where F is a function space of infinite dimension, e.g. L2([0, 1]^d). G_P is defined as the span of P random features that are linear combinations of the basis functions of F weighted by random Gaussian i.i.d. coefficients. In particular, we consider multi-resolution random combinations at all scales of a given mother function, such as a hat function or a wavelet. In this latter case, the resulting Gaussian objects are called scrambled wavelets and we show that they enable to approximate functions in Sobolev spaces H_s([0, 1]^d). As a result, given N data, the least-squares estimate \hat g built from P scrambled wavelets has excess risk ||f^* − \hat g||2 P = O(||f^*||2_{H_s([0,1]^d)}(logN)/P + P(logN)/N) for target functions f^* \in H_s([0, 1]^d) of smoothness order s > d/2. An interesting aspect of the resulting bounds is that they do not depend on the distribution P from which the data are generated, which is important in a statistical regression setting considered here. Randomization enables to adapt to any possible distribution. We conclude by describing an efficient numerical implementation using lazy expansions with numerical complexity \tilde O (2dN^3/2 logN + N^2), where d is the dimension of the input space.

You can dowload the paper from the NIPS Website (here) or from the HAL online open depository* (here).

Bibtex:
@incollection{MaillardM10b,
title = {Scrambled Objects for Least-Squares Regression},
author = {Odalric Maillard and R\'{e}mi Munos},
booktitle = {Advances in Neural Information Processing Systems 23},
editor = {J.D. Lafferty and C.K.I. Williams and J. Shawe-Taylor and R.S. Zemel and A. Culotta},
pages = {1549–1557},
year = {2010},
publisher = {Curran Associates, Inc.},
url = {http://papers.nips.cc/paper/3973-scrambled-objects-for-least-squares-regression.pdf}
}

LSTD with Random Projections.

2010, Discussing articles

Mohammad Ghavamzadeh, Alessandro Lazaric,
Odalric-Ambrym Maillard, Rémi Munos.
In NIPS’10, pages 721–729, 2010.

[Download]

Abstract:

We consider the problem of reinforcement learning in high-dimensional spaces when the number of features is bigger than the number of samples. In particular, we study the least-squares temporal difference (LSTD) learning algorithm when a space of low dimension is generated with a random projection from a highdimensional space. We provide a thorough theoretical analysis of the LSTD with random projections and derive performance bounds for the resulting algorithm. We also show how the error of LSTD with random projections is propagated through the iterations of a policy iteration algorithm and provide a performance bound for the resulting least-squares policy iteration (LSPI) algorithm.

You can dowload the paper from the NIPS website (here) or from the HAL online open depository* (here).

Bibtex:
@incollection{GhavamzadehLMM2010,
title = {LSTD with Random Projections},
author = {Ghavamzadeh, Mohammad and Lazaric, Alessandro and Odalric Maillard and R\'{e}mi Munos},
booktitle = {Advances in Neural Information Processing Systems 23},
editor = {J.D. Lafferty and C.K.I. Williams and J. Shawe-Taylor and R.S. Zemel and A. Culotta},
pages = {721–729},
year = {2010},
publisher = {Curran Associates, Inc.},
url = {http://papers.nips.cc/paper/3994-lstd-with-random-projections.pdf}
}

Online Learning in Adversarial Lipschitz Environments

2010, Discussing articles

Odalric-Ambrym Maillard, Rémi Munos.
In ECML-PKDD’10, pages 305–320, 2010

[Download]

Abstract:

We consider the problem of online learning in an adversarial environment when the reward functions chosen by the adversary are assumed to be Lipschitz. This setting extends previous works on linear and convex online learning. We  provide a class of algorithms with cumulative regret upper bounded by O(\sqrt{dT ln(\lambda)}) where d is the dimension of the search space, T the time horizon, and \lambda the Lipschitz constant. Efficient numerical implementations using particle methods are discussed. Applications include online supervised learning problems for both full and partial (bandit) information settings, for a large class of non-linear regressors/classifiers, such as neural networks.

You can dowload the paper from the ECML webiste (here) or from the HAL online open depository* (here).

Bibtex:
@inproceedings{MaillardM10a,
author = {{Odalric-Ambrym} Maillard and
R\'{e}mi Munos},
title = {Online Learning in Adversarial Lipschitz Environments.},
booktitle = {Machine Learning and Knowledge Discovery in Databases, European Conference,
{ECML} {PKDD} 2010, Barcelona, Spain, September 20-24, 2010, Proceedings,
Part {II}},
year = {2010},
pages = {305–320},
editor = {Jos\'{e} L. Balc\`{a}zar and
Francesco Bonchi and
Aristides Gionis and
Mich\`{e}le Sebag},
series = {Lecture Notes in Computer Science},
year = {2010},
volume = {6322},
publisher = {Springer}
}