PyTorch for Scientific Computing - Quantum Mechanics Example Part 2) Program Before Code Optimizations


In this post we'll put together working code to return the ground state energy and gradient for small atoms using the correlated Gaussian basis functions and Hamiltonian energy operator discussed in part 1. I'm not going to repeat much of the theory so have a look at Part 1) if you have no idea of what I'm talking about (that may or may not help!).

The title of Part 1) is "Doing Quantum Mechanics with a Machine Learning Framework: PyTorch and Correlated Gaussian Wavefunctions: Part 1) Introduction". I changed the name of the series in this post to better reflect what I'm trying to convey.

The important take-away in this post is to see how straight forward it is to write code for a complicated scientific application using PyTorch. We will optimize the code for high performance in later posts.

In Part 1) we established the feasibility of using PyTorch for this problem and confirmed that "Automatic Gradients" using PyTorch autograd works as expected.

The essence of the problem is to minimize the smallest eigenvalue, $E$, of the generalized eigenvalue problem,

$$(H-ES)c = 0$$


Where $H = T + V$ is a matrix of evaluated integrals using our basis functions. There are terms for kinetic $T$ and potential $V$ energy operators, $S$ an "overlap" matrix of integral of the basis functions with themselves. $c$ is a vector (eigenvector) of the linear coefficients of the basis expansion. [... and there is a symmetry projection operator mixed in there to make sure we are computing the right wavefunction for the ground state energy.] The basis functions and energy operator were discussed in Part 1).

The energy is the equivalent of the "loss" function in a typical machine learning problem. We want to minimize the energy.

The formulas we need to compute for the matrix elements look like, (for each $k,l$ basis function)

$$S_{kl}= 2^{3n/2}\left( \frac{\left\| L_k\right\| \left\| L_l\right\| }{\left| A_{kl}\right| }\right) ^{3/2}$$


$L_k$ and $L_l$ are lower triangular matrices of of (independent variable) parameters that will be optimized. $A_{kl} = A_k + \tau_P^{\prime }A_l\tau_P$ with $\tau$ a symmetry projection matrix. $|A|$ means determinate.

$$T_{kl}=6S_{kl}\,\mathrm{tr}\left[ MA_kA_{kl}^{-1}\tau_P^{\prime }A_l\tau_P\right]$$

This formula is implemented efficiently by utilizing $\mathrm{tr}\left[AB\right]=\sum_{i,j}\left(A_{i,j}B_{i,j}^{\prime}\right)$. That is, form the term by term product of $A$ and $B$ transpose, and then add up all of the components of the product. $M$ is a matrix of reduced mass constants.

$$V_{kl}=\frac 2{\sqrt{\pi }}S_{kl}\, \sum_{i>j}Q*R_{ij}$$

$Q$ is a lower triangular matrix of "charge products" and $R_{ij}$ is a lower triangular matrix constructed from elements in $A_{kl}^{-1}$

$$R_{ij}=\,\mathrm{\,\,} \,\left( \left\{ \begin{array}{ll} \left[A_{kl}^{-1}\right]_{ii}^{-1/2} & i=j \\ \left(\left[A_{kl}^{-1}\right]_{ii}+\left[A_{kl}^{-1}\right]_{jj} -2\left[A_{kl}^{-1}\right]_{ij}\right)^{-1/2} & i>j \end{array} \right. \right)$$

Those are very compact matrix equations for what are really quite elaborate integrals derived with matrix calculus. If those formulas were written out with summation signs instead of matrix formulas there would be several pages of ugly formulas to consider. Having formulas in matrix form is a big plus for code implementation.

For the energy (loss) function we will not do complete eigenvalue decomposition, we'll use the Rayleigh quotient,

$$E = \min_c\left(\frac{c'Hc}{c'Sc} \right)$$

The minimum over $c$ of the Rayleigh quotient is the smallest eigenvalue and that minimal $c$ is the corresponding eigenvector. By doing this we can optimize the linear expansion coefficients in $c$ simultaneously with the non-linear parameters that are contained in the matrices $L_k$.

The independent parameter variables for our energy (loss) function $E$ is the collection of all of the lower triangular elements of the $L_k$ matrices for all of the basis functions along with the vector of expansion coefficients $c$. Those optimization parameters will be placed in a vector x. That will give $E$ as a function of x that will return the energy and gradient at the value of x. We will minimize that.

Got it? Let's code it up!

The first thing to do is get working code. We will speed it up by a factor of over 15,000 in later posts! (see near the end of the post for a teaser)

Don't worry about performance yet, we just want to have something working that can be checked for correctness. In the "old-days" I would have used MATLAB for this step. In fact I had old MATLAB code for for this problem that I ran in Octave (open source 'clone' of MATLAB) as a check to be sure I was getting correct results. It turns out that it is nearly as easy to do this with PyTorch as it was with MATLAB! [ note: I really like MATLAB and Octave! ... but, I am becoming a huge fan of PyTorch!]

You should be able to cut and paste the code blocks below into Python Jupyter notebook cells. To get PyTorch setup to work in a notebook you can follow instructions in my post Why You Should Consider PyTorch (includes Install and a few examples).

import torch as th # PyTorch is imported as torch and will be referenced as "th" import time 

For these calculation float32 is not enough precision! You will need double precision i.e. float64 in order to do the optimization runs. The Titan V has great float64 performance, GeForce cards do not! We will use the CPU until we do the code optimizations. (The code as-is will run on the GPU but performance is bad.)

dtype = th.float64 gpuid = 0 #device = th.device("cuda:"+ str(gpuid)) device = th.device("cpu") print("Execution device: ",device) print("PyTorch version: ", th.__version__ ) print("CUDA available: ", th.cuda.is_available()) print("CUDA version: ", th.version.cuda) print("CUDA device:", th.cuda.get_device_name(gpuid)) 
Execution device: cpu PyTorch version: 0.4.1 CUDA available: True CUDA version: 9.0.176 CUDA device: TITAN V 
# Utility functions for working with lower triangular matrices # return the lower triangle of A in column order i.e. vech(A) def vech(A): count = 0 c = A.shape[0] v = th.zeros(c * (c + 1) // 2,) for j in range(c): for i in range(j,c): v[count] = A[i,j] count += 1 return th.tensor(v , device=device, dtype=dtype) # vech2L create lower triangular matrix L from vechA def vech2L(v,n): count = 0 L = th.zeros((n,n)) for j in range(n): for i in range(j,n): L[i,j]=v[count] count += 1 return th.tensor(L , device=device, dtype=dtype) 

The following function matrix_elements() implements the formulas presented in the fist part of this post. The code is mostly a direct translation from the math formulas.

def matrix_elements(n, vechLk, vechLl, Sym, Mass, vecQ): ''' Returns: a dictionary with the skl, tkl, vkl matrix elements n : the size of the Lk matrices vechL(k,l): vector of parameters use for Lk and Ll matrices Sym: a single symmetry projection matrix Mass: a matrix of "reduced mass" values for the particles needed for tkl vecQ: an array of products of particle charges needed for vkl ''' # build Lk and Ll Lk = vech2L(vechLk,n); Ll = vech2L(vechLl,n); # apply symmetry projection on Ll # th.t() is shorthand for th.transpose(X, 0,1) PLl = th.t(Sym) @ Ll; # build Ak, Al, Akl, invAkl Ak = Lk@th.t(Lk); Al = PLl@th.t(PLl); Akl = Ak+Al invAkl = th.inverse(Akl); # Overlap (normalized) skl = 2**(3*n/2) * th.sqrt( th.pow(th.abs(th.det(Lk))*th.abs(th.det(Ll))/th.det(Akl) ,3) ); # Kinetic energy tkl = skl*(6*th.trace(Mass@Ak@invAkl@Al)); # potential energy TWOoSqrtPI = 1.1283791670955126 # 2/sqrt(pi) RIJ = th.zeros((n,n), device=device, dtype=dtype); # 1/rij i~=j for j in range(0,n-1): for i in range(j+1,n): tmp2 = invAkl[i,i] + invAkl[j,j] - 2*invAkl[i,j]; #RIJ[i,j] = TWOoSqrtPI * skl/th.sqrt(tmp2); RIJ[i,j] = 1/th.sqrt(tmp2) # 1/rij i=j for i in range(0,n): #RIJ[i,i] = TWOoSqrtPI * skl/th.sqrt(invAkl[i,i]); RIJ[i,i] = 1/th.sqrt(invAkl[i,i]) RIJ = TWOoSqrtPI*skl*RIJ Q = vech2L(vecQ,n); vkl = th.sum(RIJ*Q) return {'skl':skl, 'tkl':tkl, 'vkl':vkl} 

You can see how close the PyTorch/Python code is to the mathematical matrix formulas. This makes it very quick to get working code running.

def test_matrix_elements(): # test input with a known result n = 3; vechLk = th.tensor([ 1.00000039208682, 0.02548044275764261, 0.3525161612610669, 1.6669144815242515, 0.9630555318946559, 1.8382882034659822 ], device=device, dtype=dtype, requires_grad=True); vechLl = th.tensor([ 1.3353550436464964, 0.9153272033682132, 0.7958636766525028, 1.8326931436447955, 0.3450426931160630, 1.8711839323167831 ], device=device, dtype=dtype, requires_grad=True); Sym = th.tensor([[0,0,1], [0,1,0], [1,0,0]], device=device, dtype=dtype); Mass = th.tensor([[5.446170e-4, 2.723085077e-4, 2.723085077e-4], [2.723085077e-4, .5002723085, 2.723085077e-4], [2.723085077e-4, 2.723085077e-4, .5002723085 ]], device=device, dtype=dtype); vecQ = th.tensor([1, -1, -1, -1, 1, -1], device=device, dtype=dtype); matels = matrix_elements(n, vechLk, vechLl, Sym, Mass, vecQ) print('skl: ',matels['skl']) # should be 0.5334 print('tkl: ',matels['tkl']) # should be 4.3509 print('vkl: ',matels['vkl']) # should be -2.3840 
start_time = time.time() test_matrix_elements() print(" took {:.6f} seconds ".format(time.time() - start_time)) 
skl: tensor(0.5334, dtype=torch.float64, grad_fn=) tkl: tensor(4.3509, dtype=torch.float64, grad_fn=) vkl: tensor(-2.3840, dtype=torch.float64, grad_fn=) took 0.047023 seconds 

Those numbers are the expected output for those parameters so we can move on to computing the energy.

"energy()" loops over the lower triangle of S T and V making calls to the matrix element code. These matrices are then used to create the Hamiltonian matrix H and Overlap S that are then used to compute the energy.

def energy(x,n,nb,Mass,Charge,Sym,symc): ''' Returns: the energy at the point (parameters) x n: number of particles defined in input nb: number of basis functions Mass: matrix of "reduced" mass constants for system Charge: vector of "charge products" for particle pairs Sym: a tensor containing the symmetry projection matrices symc: a vector of coefficients for the symmetry projection terms ''' nx = len(x); nn = int(n*(n+1)/2); nsym = len(symc); # extract linear coefs "eigen vector" from x c = x[-nb:]; # reshape non-linear variables for easier indexing X = th.reshape(x[:nb*nn], (nb,nn)) # Init H S T V H = th.zeros((nb,nb), device=device, dtype=dtype); S = th.zeros((nb,nb), device=device, dtype=dtype); T = th.zeros((nb,nb), device=device, dtype=dtype); V = th.zeros((nb,nb), device=device, dtype=dtype); # outer loop is over symmetry terms for k in range(0,nsym): for j in range(0,nb): for i in range(j,nb): vechLi = X[i,:] vechLj = X[j,:] matels = matrix_elements(n,vechLi,vechLj,Sym[:,:,k],Mass,Charge); S[i,j] += symc[k]*matels['skl']; T[i,j] += symc[k]*matels['tkl']; V[i,j] += symc[k]*matels['vkl']; H = T + V # complete upper triangle of H and S for i in range(0,nb): for j in range(i+1,nb): H[i,j] = H[j,i]; S[i,j] = S[j,i]; # The energy from the Rayleigh quotient cHc = c@H@c; cSc = c@S@c; eng = cHc/cSc; return eng 

The following function defines the constants for Li atom (infinite nuclear mass). There is sample input for an 8 term wavefunction with the non-linear parameters in x1.

def test_energy(): Mass = th.tensor([[0.5, 0.0, 0.0], [0.0, 0.5, 0.0], [0.0, 0.0, 0.5]], device=device, dtype=dtype); Charge = th.tensor([-3, 1, 1, -3, 1, -3], device=device, dtype=dtype); # symmetry projection terms Sym = th.zeros((3,3,6), device=device, dtype=dtype) # (1)(2)(3) Sym[:,:,0] = th.tensor([[1,0,0],[0,1,0],[0,0,1]], device=device, dtype=dtype); # (12) Sym[:,:,1] = th.tensor([[0,1,0],[1,0,0],[0,0,1]], device=device, dtype=dtype); # (13) Sym[:,:,2] = th.tensor([[0,0,1],[0,1,0],[1,0,0]], device=device, dtype=dtype); # (23) Sym[:,:,3] = th.tensor([[1,0,0],[0,0,1],[0,1,0]], device=device, dtype=dtype); # (123) Sym[:,:,4] = th.tensor([[0,1,0],[0,0,1],[1,0,0]], device=device, dtype=dtype); # (132) Sym[:,:,5] = th.tensor([[0,0,1],[1,0,0],[0,1,0]], device=device, dtype=dtype); # coeff's symc = th.tensor([4.0,4.0,-2.0,-2.0,-2.0,-2.0], device=device, dtype=dtype); n=3; nb=8; xvechL=th.tensor([ 1.6210e+00, -2.1504e-01, 9.0755e-01, 9.7866e-01, -2.8418e-01, -3.5286e+00, -3.3045e+00, -4.5036e+00, -3.2116e-01, -7.1901e-02, 1.5167e+00, -8.4489e-01, -2.1377e-01, -3.6127e-03, -5.3774e-03, -2.1263e+00, -2.5191e-01, 2.1235e+00, -2.1396e-01, -1.4084e-03, -1.0092e-02, 4.5349e+00, 9.4837e-03, 1.1225e+00, -2.1315e-01, 5.8451e-02, -4.9410e-03, 5.0853e+00, 7.3332e-01, 5.0672e+00, -2.1589e-01, -6.8986e-03, -1.4310e-02, 1.5979e+00, 3.3946e-02, -8.7965e-01, -1.1121e+00, -2.1903e-03, -4.6925e-02, 2.1457e-01, 3.3045e-03, 4.5120e+00, -2.1423e-01, -1.6493e-02, -2.3429e-03, -8.6715e-01, -6.7070e-02, 1.5998e+00 ], device=device, dtype=dtype, requires_grad=False) evec = th.tensor([ -6.0460e-02, 7.7708e-05, 1.6152e+00, 9.5443e-01, 1.1771e-01, 3.2196e+00, 9.6344e-01, 3.1398e+00 ], device=device, dtype=dtype, requires_grad=False) # join the parameters into x1 x1 = th.tensor(th.cat((xvechL,evec)), device=device, dtype=dtype, requires_grad=True) eng = energy(x1,n,nb,Mass,Charge,Sym,symc) print(eng) # Should return -7.3615 at this point x1 return x1 
start_time = time.time() xrestart = test_energy() print(" took {} seconds ".format(time.time() - start_time)) 
tensor(-7.3615, dtype=torch.float64, grad_fn=) took 0.11813998222351074 seconds 

With those values of the parameters for x1 the the energy is as expected. Note the "requires_grad=True" attribute for x1. That tells PyTorch to keep track of those parameters for computing gradients.

def opt_energy(steps=1, num_basis=8): ############################################################################ # Input constants for Li atom (infinite nuclear mass) ############################################################################ Mass = th.tensor([[0.5, 0.0, 0.0], [0.0, 0.5, 0.0], [0.0, 0.0, 0.5]], device=device, dtype=dtype); Charge = th.tensor([-3, 1, 1, -3, 1, -3], device=device, dtype=dtype); # symmetry projection terms Sym = th.zeros((3,3,6), device=device, dtype=dtype) # (1)(2)(3) Sym[:,:,0] = th.tensor([[1,0,0],[0,1,0],[0,0,1]], device=device, dtype=dtype); # (12) Sym[:,:,1] = th.tensor([[0,1,0],[1,0,0],[0,0,1]], device=device, dtype=dtype); # (13) Sym[:,:,2] = th.tensor([[0,0,1],[0,1,0],[1,0,0]], device=device, dtype=dtype); # (23) Sym[:,:,3] = th.tensor([[1,0,0],[0,0,1],[0,1,0]], device=device, dtype=dtype); # (123) Sym[:,:,4] = th.tensor([[0,1,0],[0,0,1],[1,0,0]], device=device, dtype=dtype); # (132) Sym[:,:,5] = th.tensor([[0,0,1],[1,0,0],[0,1,0]], device=device, dtype=dtype); # coeff's symc = th.tensor([4.0,4.0,-2.0,-2.0,-2.0,-2.0], device=device, dtype=dtype); ############################################################################ n=3 nb=num_basis th.manual_seed(3) x1 = th.empty(int(nb*n*(n+1)/2 + nb), device=device, dtype=dtype, requires_grad=True) th.nn.init.uniform_(x1, a=-.8, b=.8) #x1 = xrestart optimizer = th.optim.Rprop([x1], lr=0.001, etas=(0.5, 1.2), step_sizes=(1e-06, 50)) for i in range(steps): optimizer.zero_grad() loss = energy(x1,n,nb,Mass,Charge,Sym,symc) loss.backward() optimizer.step() if (i<10>

"opt_energy()" is the main program. I have the problem input constants setup here as well as the optimization code. This runs optimization steps on our energy function. The gradient for the optimization is evaluated by "back propagation" in the statement loss.backward(). I used the name loss in honor of standard Machine Learning name for the function being minimized.

I experimented with some of the machine learning optimization routines in PyTorch and found that "Rprop" worked really well for this problem. I used the a utility from the neural network module "nn" for the random parameter initialization. It was great having these kinds of tools in the framework for experimenting with!

You can run this code and it will optimize a wavefunction for the Li atom ground state energy. It will be converging toward a value of -7.478060 Hartrees' (an energy unit typically used in quantum chemistry). Running this for 100 "epochs" i.e. optimization steps gives the following output.

start_time = time.time() xrestart = opt_energy(steps=100, num_basis=8) print(" took {} seconds ".format(time.time() - start_time)) 
step: 0 f: -0.782443089544 gradNorm: 6.657278345 step: 1 f: -0.813405818204 gradNorm: 6.663333179 step: 2 f: -0.850565814089 gradNorm: 6.664066434 step: 3 f: -0.895095668330 gradNorm: 6.655362804 step: 4 f: -0.948326026794 gradNorm: 6.631036613 step: 5 f: -1.011714048225 gradNorm: 6.582179112 step: 6 f: -1.086766887327 gradNorm: 6.496718392 step: 7 f: -1.174915424078 gradNorm: 6.359865888 step: 8 f: -1.277512892162 gradNorm: 6.162854834 step: 9 f: -1.395322269810 gradNorm: 5.891109978 step: 10 f: -1.528494104459 gradNorm: 5.568453572 step: 20 f: -4.012585110663 gradNorm: 2.752546989 step: 30 f: -6.571789033344 gradNorm: 1.543067756 step: 40 f: -7.223244601602 gradNorm: 0.284565245 step: 50 f: -7.289228643587 gradNorm: 0.149059481 step: 60 f: -7.353758373450 gradNorm: 0.142130627 step: 70 f: -7.368628179269 gradNorm: 0.069057239 step: 80 f: -7.378787154917 gradNorm: 0.062784281 step: 90 f: -7.389791432408 gradNorm: 0.047265743 step: 99 f: -7.398805061984 gradNorm: 0.095278731 took 28.317583560943604 seconds 

This is a very small 8 term wavefunction so it wont give a great result and the current code is too slow for large numbers of basis functions.

The next run is for a wavefunction with 512 basis terms. This is one step of the optimization which will give us a time for one energy and gradient evaluation. (this is from a random start point)

start_time = time.time() xrestart = opt_energy(steps=1, num_basis=512) print(" took {} seconds ".format(time.time() - start_time)) 
step: 0 f: -0.720436686340 gradNorm: 4.060212842 took 2768.9312148094177 seconds 

2768 seconds for one step is too long!

I'll tease you and do the same thing with my fast version of the code. After all of the code optimizations in the next few posts and running on a Titan V GPU we'll be able to bring that execution time down to the following,

start_time = time.time() xrestart = opt_energyrc(steps=1,num_basis=512) print(" took {:.4f} seconds ".format(time.time() - start_time)) 
step: 0 f: 82.210051671516 gradNorm: 495.193096479 took 0.1683 seconds 

Yes!, from 2768 seconds down to 0.1683 seconds! That's 16452 Times Faster!! Yes, really, that much. Some of the simple code optimizations will be surprising by how much difference they make.

I'll be taking a vacation next week so it may be two weeks before I start writing up the code optimizations. I will be worth the wait. I promise!

Happy computing --dbk

Tags: PyTorch, Python, Scientific Computing, CUDA, Quantum Mechanics