A high bias low-variance introduction to Machine Learning for physicists.


This website contains Python notebooks that accompany our review entitled A high-bias, low-variance introduction to Machine Learning for physicists. An updated version of the review can be downloaded from the arxiv at arXiv:1803.08823.

The authors of the review are Pankaj Mehta, Marin Bukov, Ching-Hao Wang, Alexandre Day, Clint Richardson, Charles Fisher, David Schwab. Please help improve the manuscript. Feel free to submit comments, suggestions, and typos here.

Datasets: Most of the examples in the notebooks use the three datasets described below. Details on the datasets can be found in the Appendix of the review.

  • MNIST. MNIST is a dataset of of handwritten numerical characters. The dataset consists of a training set of 60,000 examples and a test set of 10,000 examples
  • SUSY datset The SUSY dataset consists of Monte-Carlo simulations of supersymmetric and non-supersymmetric collision events. More informaton about the dataset can be found in the accompanying paper.
  • Nearest Neighbor Ising Model. This dataset consists of samples generated from the two-dimensional nearest-neighbor coupled Ising model at a range of temperatures above and below the critical point. The dataset can be downloaded here. More information about the dataset can be found in the appendix of the accompanying review.

Python Information: It is recommended that users use Python 3.6 or above (though most notebooks will work with any version of Python 3). Notebooks contain instructions for installing and downloading appropriate packages.

Information about notebooks: There are are a total of 20 notebooks that accompany the review. Most of these notebooks are new. However, others (mostly those based on the MNIST dataset) are modified versions of notebooks/tutorials developed by the makers of commonly used machine learning packages such as Keras, PyTorch, scikit learn, TensorFlow, as well as a new package Paysage for energy-based generative model maintained by Unlearn.AI. All the notebooks make generous use of code from these tutorials as well the rich ecosystem of publically available blog posts on Machine Learning by researchers, practioners, and students. We have included links to all relevant sources within each notebook. For full disclosure, we note that Unlearn.AI is affiliated with two of the authors Charles Fisher (founder) and Pankaj Mehta (scientific advisor).

The notebooks are named according to the convention NB_CXX-description.ipynb where CXX refers to the corresponding section in the review (e.g. a notebook for Section VII about Random Forests will have a name of the form NB_CVII-Random_Forests.ipynb).

A zip file containing all notebooks can be downloaded here. Individual notebooks can be downloaded below. We also include links to html versions of the notebook. RBM notebooks have been updated on 6/6/18.