After nearly two years since starting to write the blog we have at last completed a first draft of the book, which is to be published by Cambridge University Press.
The book is available for free as a PDF and will remain so after publication. We’re grateful to Cambridge for allowing this.
Without further ado, here is the link.
Although we still have a few things we want to do, the manuscript is sufficiently polished to be useful. Of course we would greatly appreciate any comments you might have, including typos, errors in the proofs, missing references, confusing explanations or anything else you might notice. We will periodically update the book, so it would be helpful if you could quote the revision number on the cover when sending us your comments (firstname.lastname@example.org).
The manuscript includes a lot of material not in the blog. The last seven chapters are all new, covering combinatorial (semi-)bandits, non-stationary bandits, ranking, pure exploration, Bayesian methods, Thompson sampling, partial monitoring and an introduction to learning in Markov decision processes. Those chapters that are based on blog posts have been cleaned up and often we have added significant depth. There is a lot of literature that we have not covered. Some of these missing topics are discussed in extreme brevity in the introduction to Part VII. It really is amazing how large the bandit literature has become and we’re sorry not to have found space for everything.
The book includes around 250 exercises, some of which have solutions. On average the exercises have been proofread less carefully than the rest of the book, so some caution is advised. The solutions to selected exercises are available here.
Finally, we’re very thankful for all the feedback already received, both on the blog and early drafts of the book.