Odalric-Ambrym Maillard, Rémi Munos, Alessandro Lazaric,
Mohammad Ghavamzadeh.
In ACML 2010, volume 13 of JMLR W&CP, pages 299-314, 2010.
[Download]
Abstract: |
We consider the Bellman residual minimization approach for solving discounted Markov decision problems, where we assume that a generative model of the dynamics and rewards is available. At each policy iteration step, an approximation of the value function for the current policy is obtained by minimizing an empirical Bellman residual dened on a set of n states drawn i.i.d. from a distribution \mu, the immediate rewards, and the next states sampled from the model. Our main result is a generalization bound for the Bellman residual in linear approximation spaces. In particular, we prove that the empirical Bellman residual approaches the true (quadratic) Bellman residual in \mu-norm with a rate of order O(1/\sqrt{n}). This result implies that minimizing the empirical residual is indeed a sound approach for the minimization of the true Bellman residual which guarantees a good approximation of the value function for each policy. Finally, we derive performance bounds for the resulting approximate policy iteration algorithm in terms of the number of samples n and a measure of how well the function space is able to approximate the sequence of value functions. |
You can dowload the paper from the JMLR webiste (here) or from the HAL online open depository* (here).
Bibtex: |
@inproceedings{MaillardMLG10, author = {O. A. Maillard and R. Munos and A. Lazaric and M. Ghavamzadeh}, title = {Finite Sample Analysis of {B}ellman Residual Minimization}, booktitle = {Asian Conference on Machine Learpning. JMLR: Workshop and Conference Proceedings}, volume = 13, editor = {Masashi Sugiyama and Qiang Yang}, pages = {309-324}, year = {2010} } |