# Programming in Z3 by learning to think like a compiler

By Marianne Bellotti

Programming in logic can help you produce better optimized, more secure, bug-free code … if only you knew how to do it.

It took me about two years to write my first program in Z3. I had done some tutorials, wrote code to solve other people’s example puzzles, but I couldn’t really figure out how to cross over from very abstract “toy” use cases to applications that had actual relevance.

Z3 is a Satisfiability modulo theories (SMT) solver, a cousin to the satisfiability (SAT) solvers I’ve written about before. Whereas SAT solvers require encoding everything in to a set of True/False statements, SMT solvers allow for greater expressiveness. In plain SAT you can represent addition by representing numbers in binary and performing logic operations to produce the correct sum. After all, that’s the way the CPU processes everything. If this concept is completely foreign to you, I recommend looking at NAND to Tetris as a great way of exploring how logic gates get built up into complex processors.

In SMT, however, you can just write `2+2` as `2+2`, because SMT builds in those abstractions (called “theories”) … basically: just as a programming language like python will let you write things in neat, English looking phrases because it understands how to convert those phrases into the machine code old school programmers had to contend with, SMT solvers let people write neater models because it understands how to convert `2+2` into logic.

There are two main ways theories work in SMT. The first is essentially just compiling to the SAT equivalent and executing it as one large SAT. The second approach encapsulates those non-Boolean specific sections of the model into a SAT problem inside the SAT to be handled separately. The benefit of the second approach is greater efficiency around questions like: `x + y = y + x`

All of that is super cool and completely useless if you can’t figure out how to represent your problem in logic + arithmetic. And much of what programmers do doesn’t look like logic + arithmetic … at least not at first glance.

## Think Like a Compiler

I learn a lot from attempting to learn other things and this was one of those situations. I had given up trying to understand how to write SMT and use Z3. It was simply too hard.

Then I started learning about how compilers work.

If you want to take a piece of code and represent it in Z3, you have to follow the same steps the compiler would. That’s because both SMT/SAT and CPUs think in boolean logic. So in order for a piece of code to get executed it has to be translated into boolean logic. That means the following steps*:

1. In-lining all the functions. Functions being reusable and callable blocks of code is useful for humans for whom less code means better readability. But not useful to the computer which would prefer a nice stream of instructions one after the other, moving only forward and never backwards. So a compiler takes all your neat function calls and replaces them with the function body so that the code is as flat as possible.
2. Unrolling loops/recursion. We want flat code, so those loops have got to go too. This works the same way as in-lining, we just copy and paste the code block in the loop the same number of times as the loop.
3. Make variables immutable. Otherwise known as Single Static Assignment (SSA) `a = a + b` becomes `a1 = a + b` so that once a variable is assigned a value it does not change.

*compiler geeks will butt in here with some “well actually…”s about variation and debates in compiler design, which is interesting but off topic for this post. The point is following these steps will give you code that looks like a Z3 model.

In retrospect this seems really obvious: since the processor thinks in boolean logic, of course you can create models in Z3 by following the same process that the compiler uses to present that code in boolean logic. Why aren’t there more tools to just compile directly to a SMT or SAT friendly format for you?

Well…. that’s because an SMT/SAT model is more than just a set of instructions. It’s a puzzle. You can convert the code automatically but that isn’t super useful. Without assertions (or predicates as they’re typically called) there’s nothing for the solver to solve. Converting the code is one critical piece, then you need to tell Z3 what states to solve for.

And here you have two choices: you either tell the solver what states are required and it returns the various conditions in which those states happen or you tell the solver what states are impossible and check to make sure you’re correct. These two pathways open up a world of potential applications. If you ask the model to generate valid states you can build optimizers (find a solution that is faster or uses less code than a known solution) you can write a fuzzer, you can synthesize code and reverse engineer (although this is probably not practical for large programs)

If you ask the model to find invalid states you can check for bugs (especially concurrency issues), you can verify that two pieces of code do the same thing (helpful if you’re looking to simplify), you can check for security issues…. and on and on and on.

## Bytecode to Z3

Just because we can’t automatically generate SMT/SAT friendly code doesn’t mean we have to do all the work of the compiler in our heads. Many compilers will let you inspect some part of what they are generating in process. In python, for example, the module `dis` will output the bytecode of the program. Using some simple code written in python I can demonstrate how you can reason about the practical code you write everyday in SMT.

Take for example this small function that tests whether a year is a leap year:

`def leap(year):  if year % 4 == 0 and year % 100 != 0:     return true  else:     return false`

You can actually write Z3 models in python, which is useful here because SMT-LIB2 (Z3’s preferred language) uses polish notation… and if you’re not familiar with that these examples will be harder to make sense of.

But otherwise this is pretty easy to convert to Z3. We don’t even need to consult the bytecode for ideas:

`from z3 import *year = Int('year')s = Solver()s.add(year % 4 == 0)s.add(year % 100 != 0)s.check()`

When we run this model, Z3 will determine whether there’s a value for our variable `year`that satisfies the two rules we’ve given it. If the answer is yes we can add the line `s.model()` to output one possible solution. Doing this might return a surprising value (for example -96!) that is not a valid year and could be a potential bug in our program.

But instead of running the model over and over and over again, looking for edge cases we can be a little more pragmatic. If we know what states should be impossible. For example, we know millennial years are leap years and that years that end in an odd number can’t possibly be leap years. By adding these features as assertions in our model we can test whether our algorithm is accurate or not:

`from z3 import *year = Int('year')s = Solver()s.add(year % 4 == 0)s.add(year % 100 != 0)s.push()s.add(year == 2000)if s.check() == unsat:   print("2000 failed")s.pop()s.push() # Reload the original two ruless.add(year != 1953, year != 1800, year != 1969)if s.check() == unsat:   print("Non-leap year not detected")s.check()`

Whoops! We forgot about the millennial exception to the leap year rule! With a simple program we probably didn’t need an SMT solver to spot this problem, but with more complicated programs such bugs can remain hidden for years.

Notice `s.push()` and `s.pop()` allow us to add and remove assertions. Among other things, this will let us incrementally check a model.

But what about a more complicated program?

`def calc(x, y):    if(x < y):        x = x + y        for i in range(0,3):        y = x + next(y)    return x + ydef next(y):    return y + 1`

Although it’s only a couple lines of code, it represents lots of constructs we need to translate to model correctly in SMT. There’s variable reassignment, there are function calls, there are loops…. This program is simple enough that you might be able to translate to a SMT format on your own, but let’s use the bytecode for it as a guide

`>>> dis.dis(calc)  2           0 LOAD_FAST                0 (x)              2 LOAD_FAST                1 (y)              4 COMPARE_OP               0 (<)              6 POP_JUMP_IF_FALSE       163           8 LOAD_FAST                0 (x)             10 LOAD_FAST                1 (y)             12 BINARY_ADD             14 STORE_FAST               0 (x)5     >>   16 SETUP_LOOP              30 (to 48)             18 LOAD_GLOBAL              0 (range)             20 LOAD_CONST               1 (0)             22 LOAD_CONST               2 (3)             24 CALL_FUNCTION            2             26 GET_ITER        >>   28 FOR_ITER                16 (to 46)             30 STORE_FAST               2 (i)6          32 LOAD_FAST                0 (x)             34 LOAD_GLOBAL              1 (next)             36 LOAD_FAST                1 (y)             38 CALL_FUNCTION            1             40 BINARY_ADD             42 STORE_FAST               1 (y)             44 JUMP_ABSOLUTE           28        >>   46 POP_BLOCK7     >>   48 LOAD_FAST                0 (x)             50 LOAD_FAST                1 (y)             52 BINARY_ADD             54 RETURN_VALUE`

The documentation for `dis` contains descriptions of each instruction but essentially what’s happening here is:

`# if(x < y):- Put the value of x on the stack- Put the value of y on the stack- Pop x and y off the stack and put the result of x < y on the stack- If top of stack (TOS) is False, pop it off and jump forward to 16# x = x + y- Put the value of x on the stack- Put the value of y on the stack- Pop x and y off the stack and put the result of x + y on the stack- Pop the TOS and assign it to variable x# for i in range(0,3):- Repeat instructions 30 to 48- Put beginning of range on TOS- Put end of range on TOS- Pop beginning and end off stack and put range call result on TOS- Convert list to iterable and put on TOS- Call next on the iterable, put on TOS- Pop the TOS and assign it to variable i# y = x + next(y)- Put the value of x on the stack- Put instructions for function next on the stack- Put the value of y on the stack- Pop next function and y off the stack and call function next(y)- Pop results of next(y) and value of x off the stack and add- Pop result off the stack and assign to variable y`

Obviously there is more, but you get the point. Laid out like this it is easier to see how to represent this function in Z3. Fortunately our loop only has three iterations, if it had more or was `while True:` we’d have to make a determination about how to handle that because Z3 models are bounded, meaning loops need to have specific endpoints. Also since SMT/SAT is NP-complete a model might take a long time to return an answer or never return one at all…. so it’s generally a good idea to keep the number of iterations down.

Anyway, in Z3 the calc function would look like this:

`x, x1 = Ints('x x1')y, y1, y2, y3 = Ints('y y1 y2 y3')nextv, nextv1, nextv2 = Ints('nextv nextv1 nextv2')result = Int('result')s = Solver()s.add(If(x<y, x1 == x + y, x1 == x))s.add(nextv == y + 1)s.add(y1 == x1 + nextv)s.add(nextv1 == y1 + 1)s.add(y2 == x1 + nextv1)s.add(nextv2 == y2 + 1)s.add(y3 == x1 + nextv2)s.add(result == x1 + y3)s.check()s.model()`

Since we’re using the python bindings for Z3, we can check that it works by running the original function and then adding additional conditions to our solver defining the value of result and asking it to solve for `x, y` Although we’ll probably also discover that there are other possible `x, y` combinations that will produce the same result!

## Learning More…

Hopefully I’ve helped you conceptualize how to represent code you’re working on now in SMT. If you’re ready to start going deeper into the world of Z3, here are some resources I would recommend: