Here are some of my contributions. Other papers are under review, others are in progress…

Preprints:

*Active Roll-outs in MDP with Irreversible Dynamics*. with T.A Mann, S. Mannor and R. ortner.*Optimistic-path based UCRL for average-reward continuous state-action MDPs*, with R. Ortner.*Boundary crossing probabilites: A tribute to an old proof*.

**2015**

- A note on replacing uniform subsampling by random projections in MCMC for linear regression of tall datasets. Rémi Bardenet, Odalric-Ambrym Maillard.

NIPS workshop 2015,

**2014**

**“How hard is my MDP?” Distribution-norm to the rescue.**[Bib][Pdf]

Odalric-Ambrym Maillard, Timothy A. Mann, Shie Mannor.

To appear in Proceedings of the 27th*conference on advances in Neural Information Processing Systems, 2014.*

**Selecting Near-Optimal Approximate State Representations in Reinforcement Learning.**[Bib][Pdf]

Ronald Ortner, Odalric-Ambrym Maillard, Daniil Ryabko.

To appear in*Algorithmic Learning Theory*, 2014.**Sub-sampling for multi-armed bandits.**[Bib][Pdf]

Akram Baransi, Odalric-Ambrym Maillard, Shie Mannor.

To appear in*Europeean conference on Machine Learning, 2014.***Concentration inequalities for sampling without replacement.**[Bib][Pdf]

Rémi Bardenet, Odalric-Ambrym Maillard,

To appear in*Bernoulli*, 2014.

**2013**

**Latent bandits.**[Bib][Pdf]

Odalric-Ambrym Maillard, Shie Mannor.

In*International conference on Machine Learning, 2014.***Robust risk-averse stochastic multi-armed bandits.**[Bib][Pdf]

Odalric-Ambrym Maillard.

In*Algorithmic Learning Theory, 2013*.**Kullback–leibler upper confidence bounds for optimal sequential allocation.**[Bib][Pdf]

Olivier Cappé, Aurélien Garivier, Odalric-Ambrym Maillard, Rémi Munos, Gilles Stoltz.

In*The Annals of Statistics, 2013.***Competing with an infinite set of models in reinforcement learning**. [Bib][Pdf]

Phuong Nguyen, Odalric-Ambrym Maillard, Daniil Ryabko,Ronald Ortner.

In*International Conference on Artificial Intelligence and Statistics, 2013.*

**2012**

**Optimal regret bounds for selecting the state representation in reinforcement learning**. [Bib][Pdf]

Odalric-Ambrym Maillard, Phuong Nguyen, Ronald Ortner, Daniil Ryabko.

In*Proceedings of the 30th international conference on machine learning, ICML 2013*, 2013.Odalric-Ambrym Maillard. [Bib][Pdf]**Hierarchical optimistic region selection driven by curiosity.**

In*Proceedings of the 25th conference on advances in Neural Information Processing Systems, NIPS ’12*, 2012.Alexandra Carpentier, Odalric-Ambrym Maillard.**Online allocation and homogeneous partitioning for piecewise constant mean-approximation.**[Bib][Pdf]

In*Proceedings of the 25th conference on advances in Neural Information Processing Systems, NIPS ’12, 2012.***Linear Regression with Random Projections**.*[Bib][Pdf]*Odalric-Ambrym Maillard, Rémi Munos.

In*Journal of Machine Learning Research 2012*.

**2011**

**Apprentissage Séquentiel : Bandits, Statistique et Renforcement***. [Pdf]*Odalric-Ambrym Maillard.

PhD thesis, Université de Lille 1, October 2011.**[AFIA PhD Prize 2012]**Odalric-Ambrym Maillard, Daniil Ryabko, Rémi Munos.**Selecting the state-representation in reinforcement learning**[Bib][Pdf]

In*Proceedings of the 24th conference on advances in Neural Information Processing Systems,*NIPS ’11, pages 2627–2635, 2011.**Sparse recovery with Brownian sensing***.*[Bib][Pdf]Alexandra Carpentier, Odalric-Ambrym Maillard, Rémi Munos.

In*Proceedings of the 24th conference on advances in Neural Information Processing Systems,*NIPS ’11, 2011.**Finite-time analysis of multi-armed bandits problems with Kullback-Leibler divergences**. [Bib][Pdf]

Odalric-Ambrym Maillard, Gilles Stoltz, Rémi Munos.

In*Proceedings of the 24th annual Conference On Learning Theory*, COLT ’11, 2011.**Adaptive bandits: Towards the best history-dependent strategy***. [Bib][Pdf]*Odalric-Ambrym Maillard, Rémi Munos.

In*Proceedings of the 14th international conference on Artificial Intelligence and Statistics*, AI&Statistics 2011, volume 15 of JMLR W&CP, 2011.

**2010**

**Finite sample analysis of bellman residual minimization**. [Bib][Pdf]

Odalric-Ambrym Maillard, Rémi Munos, Alessandro Lazaric, Mohammad Ghavamzadeh.

In*Proceedings of the Asian Conference on Machine Learning*, ACML 2010, volume 13 of JMLR W&CP, pages 299-314, 2010.**Scrambled objects for least-squares regression**. [Bib][Pdf]

Odalric-Ambrym Maillard, Rémi Munos.

In J. Lafferty, C. K. I. Williams, J. Shawe-Taylor, R.S. Zemel, and A. Culotta, editors,*Proceedings of the 23rd conference on advances in Neural Information Processing Systems*, NIPS ’10, pages 1549–1557, 2010.**LSTD with random projections**. [Bib][Pdf]

Mohammad Ghavamzadeh, Alessandro Lazaric, Odalric-Ambrym Maillard, Rémi Munos.

In J. Lafferty, C. K. I. Williams, J. Shawe-Taylor, R.S. Zemel, and A. Culotta, editors,*Proceedings of 23th conference on advances in Neural Information Processing Systems*, NIPS ’10, pages 721–729, 2010.**Online learning in adversarial lipschitz environments***.*[Bib][Pdf]Odalric-Ambrym Maillard, Rémi Munos.

In*Proceedings of the 2010 European Conference on Machine Learning and Knowledge Discovery in Databases: Part II*, ECML PKDD’10, pages 305–320, Berlin, Heidelberg, 2010. Springer-Verlag.

**2009**

**Compressed least-squares regression**. [see**Linear Regression with Random Projections, 2012**for corrections] [Bib][Pdf,Pdf]

Odalric-Ambrym Maillard, Rémi Munos.

In Y. Bengio, D. Schuurmans, J. Lafferty, C. K. I. Williams, and A. Culotta, editors,*Proceedings of the 22nd conference on advances in Neural Information Processing Systems*, NIPS ’09, pages 1213–1221, 2009.

**Complexity versus Agreement for Many Views**. [Bib][Pdf]

Odalric-Ambrym Maillard, Nicolas Vayatis*.*

In ALT 2009, pages 232–246, 2009.

**2005**

**Parallelization of the TD(lambda) Learning Algorithm***.*Odalric-Ambrym Maillard, Rémi Coulom, Philippe Preux.

In*Proceedings of the 7th European Workshop on Reinforcement Learning*, EWRL7, 2005.