# Notes on Paxos

This is the place were we start discussing real Paxos :-) It starts with a “ballot voting” algorithm. This algorithm, just like the ones we’ve already seen, does not define any messages. Rather, message passing is an implementation detail, so we’ll get to it later.

Recall that rigged voting requires all acceptors to vote for a single values. It is immune to split voting, but is susceptible to getting stuck when the leader goes offline. The idea behind ballot voting is to have many voting rounds, ballots. In each ballot, acceptors can vote only for a single value, so each ballot individually can get stuck. However, as we are running many ballots, some ballots will make progress. The value is chosen in a ballot if it is chosen by some quorum of acceptors. The value is chosen in an overall algorithm if it is chosen in some ballot.

The Turing award question is: how do we make sure that no two ballots choose different values? Note that it is OK if two ballots choose the same value.

Let’s just brute force this question, really. First, assume that the ballots are ordered (for example, by numbering them with natural numbers). And let’s say we want to pick some value `v` to vote for in ballot `b`. When `v` is safe? Well, when no other value `v1` can be chosen by any other ballot. Let’s tighten this up a bit.

Value `v` is safe at ballot `b` if any smaller ballot `b1` (`b1 < b`) did not choose and will not choose any value other than `v`.

So yeah, easy-peasy, we just need to predict which values will be chosen in the future, and we are done! We’ll deal with it in a moment, but let’s first convince ourselves that, if we only select safe values for voting, we won’t violate consensus spec.

So, when we select a safe value `v` to vote for in a particular ballot, it might get chosen in this ballot. We need to check that it won’t conflict with any other value. For smaller ballots that’s easy — it’s the definition of safety condition. What if we conflict with some value `v1` chosen in a future ballot? Well, that value is also safe, so whoever chose `v1`, was sure that it won’t conflict with `v`.

How do we tackle the precognition problem? We’ll ask acceptors to commit to not voting in certain ballots. For example, if you are looking for a safe value for ballot `b` and know that there’s a quorum `q` such that each quorum member never voted in smaller ballots, and promised to never vote in smaller ballots, you can be sure that any value is safe. Indeed, any quorum in smaller ballots will have at least one member which would refuse to vote for any value.

Ok, but what if there’s some quorum member which has already voted for some `v1` in some ballot `b1 < b`? (Take a deep breath, the next sentence is the kernel of the core idea of Paxos). Well, that means that `v1` was safe at `b1`, so, if there will be no votes between `b1` and `b`, `v1` is also safe at `b`! (Exhale). In other words, to pick a safe value at `b` we:

1. Take some quorum `q`.

2. Make everyone in `q` promise to never vote in ballots earlier than `b`.

3. Among all of the votes already cast by the quorum members we pick the one with the highest ballot number.

4. If such vote exists, its value is a safe value.

5. Otherwise, any value is safe.

To implement the “never vote” promise, each acceptor will maintain `maxBal` value. It will never vote in ballots smaller or equal to `maxBal`.

Let’s stop hand-waving and put this algorithm in math. Again, we are not thinking about messages yet, and just assume that each acceptor can observe the state of the whole system.

Ballot Vote

`````` 1 2 3 4 5 6 7 8 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
-- [b1;b2]: inclusive interval of ballots
-- Y^X: set of function from X to Y (f: X -> Y)
-- λ x ∈ X: y: function that maps x to y
-- ¬: "not", negation
--
-- f' = λ x1 ∈ X: if x1 = x then y else f(a):
-- A tedious way to write that f' is the same function as f,
-- except on x, where it returns y instead.
--
-- I am sorry! In my defense, TLA+ notation for this
-- is also horrible :-)
``````

Let’s unwrap this top-down. First, the `chosen` condition says that it is enough for some quorum to cast votes in some ballot for a value to be accepted. It’s trivial to see that, if we fix the ballot, then any two quorums would vote for the same value — quorums intersect. Showing that quorums vote for the same value in different ballots is the tricky bit.

The `Init` condition is simple — no votes, any acceptor can vote in any ballot (= any ballot with number larger than -1).

The `Next` consists of two cases. On each step of the protocol, some acceptor either votes for some value in some ballot `∃ v ∈ 𝕍: Vote(a, b, v)`, or declares that it won’t cast additional vote in small ballots `AdvanceMaxBal(a, b)`. Advancing ballot just sets `maxBal` for this acceptor (but takes care not to rewind older decisions). Casting a vote is more complicated and is predicated on three conditions:

• We haven’t forfeited our right to vote in this ballot.

• If there’s some vote in this ballot already, we are voting for the same value.

• If there are no votes, then the value should be safe.

Note that the last two checks overlap a bit: if the set of votes cast in a ballot is not empty, we immediately know that the value is safe: somebody has proven this before. But it doesn’t harm to check for safety again: a safe value can not become unsafe.

Finally, the safety check. It is done in relation to some quorum — if `q` proves that `v` is safe, than members of this quorum would prevent any other value to be accepted in early ballots. To be able to do this, we first need to make sure that `q` indeed finalized their votes for ballots less than `b` (`maxBall` is at least `b - 1`). Then, we need to find the latest vote of `q`. There are two cases

• No one in `q` ever voted (`b1 = -1`). In this case, there are no additional conditions on `v`, any value would work.

• Someone in `q` voted, and `b1` is the last ballot when someone voted. Then `v` must be the value voted for in `b1`. This implies `Safe(v, b1)`.

If all of these conditions are fulfilled, we cast our vote and advance `maxBall`.

This is the hardest part of the article. Take time to fully understand Ballot Vote.

Rigorously proving that Ballot Voting satisfies Consensus would be tedious — the specification is large, and the proof would necessary use every single piece of the spec! But let’s add some hand-waving. Again, we want to provide homomorphism from Ballot Voting to Consensus. Cases where the image of a step is a stuttering step (the set of chosen values is the same) are obvious. It’s also obvious that the set of chosen values never decreases (we never remove votes, so a value can not become unchosen). It also increases by at most one value with each step.

The complex case is to prove that, if currently only `v1` is chosen, no other `v2` can be chosen as a result of the current step. Suppose the contrary, let `v2` be the newly chosen value, and `v1` be a different value chosen some time ago. `v1` and `v2` can’t belong to the same ballot, because every ballot contains votes only for a single value (this needs proof!). Lets say they belong to `b1` and `b2`, and that `b1 < b2`. Note that `v2` might belong to `b1` — nothing prevents smaller ballot from finishing later. When we chose `v2` for `b2`, it was safe. This means that some quorum either promised not to vote in `b1` (but then `v1` couldn’t have been chosen in `b1`), or someone from the quorum voted for `v2` in `b1` (but then `v1 = v2` (proving this might require repeated application of safety condition)).

Ok, but is this better than Majority Voting? Can Ballot Voting get stuck? No — if at least one quorum of machines is online, they can bump their `maxBall` to a ballot bigger than any existing one. After they do this, there necessary will be a safe value relative to this quorum, which they can then vote on.

However, Ballot Voting is prone to a live lock — if acceptors continue to bump `maxBal` instead of voting, they’ll never select any value. In fact, in the current formulation one needs to be pretty lucky to not get stuck. To finish voting, there needs to be a quorum which can vote in ballot `b`, but not in any smaller ballot, and in the above spec this can only happen by luck.

It is impossible to completely eliminate live locks without assumptions about real time. However, when we implement Ballot Voting with real message passing, we try to reduce the probability of a live lock.