In this article, I’m going to explain how the MVCC (Multi-Version Concurrency Control) mechanism works using PostgreSQL as a reference implementation.
In Concurrency Control theory, there are two ways you can deal with conflicts:
Because MVCC (Multi-Version Concurrency Control) is such a prevalent Concurrency Control technique (not only in relational database systems, in this article, I’m going to explain how it works.
When the ACID transaction properties were first defined, Serializability was assumed. And to provide a Strict Serializable transaction outcome, the 2PL (Two-Phase Locking) mechanism was employed. When using 2PL, every read requires a shared lock acquisition, while a write operation requires taking an exclusive lock.
However, locking incurs contention, and contention affects scalability. The Amdhal’s Law or the Universal Scalability Law demonstrate how contention can affect response Time speedup.
For this reason, database researchers have come up with a different Concurrency Control model which tries to reduce locking to a bare minimum so that:
The only use case that can still generate contention is when two concurrent transactions try to modify the same record since, once modified, a row is always locked until the transaction that modified this record either commits or rolls back.
In order to specify the aforementioned Reader/Writer non-locking behavior, the Concurrency Control mechanism must operate on multiple versions of the same record, hence this mechanism is called Multi-Version Concurrency Control (MVCC).
While 2PL is pretty much standard, there’s no standard MVCC implementation, each database taking a slightly different approach. In this article, we are going to use PostgreSQL since its MVCC implementation is the easiest one to visualize.
While Oracle and MySQL use the undo log to capture uncommitted changes so that rows can be reconstructed to their previously committed version, PostgreSQL stores all row versions in the table data structure.
What’s even more interesting is that every row has two additional columns:
In PostgreSQL, the Transaction Id is a 32-bit integer, and the VACUUM process is responsible (among other things like reclaiming old row versions that are no longer in use) for making sure that the id does not overflow.
For this reason, you should never disable the VACUUM as transaction wraparound can lean to catastrophic situations.
To understand how INSERT works in MVCC, consider the following diagram:
txid_current()
PostgreSQL functionpost
row, the column value is set to Alice’s transaction idIf the transaction id is higher than the value of a committed row, the transaction is allowed to read this record version.
If the transaction id is lower than the value, then it’s up to the isolation level to decide if a record should be visible or not. For READ COMMITTED, the currently executing statement timestamp becomes the lower boundary for row visibility. For REPEATABLE READ or SERIALIZABLE, all reads are relative to the start timestamp of the currently running transaction.
To understand how DELETE works in MVCC, consider the following diagram:
txid_current()
PostgreSQL functionpost
row, the column value is set to Bob’s transaction idWhile in 2PL, Bob’s modification would block Alice read statement, in MVCC Alice is still allowed to see the previous version until Bob manages to commit his transaction.
The DELETE operation does not physically remove a record, it just marks it as ready for deletion, and the VACUUM process will collect it when this row is no longer in use by any current running transaction.
If the transaction id is greater than the value of a committed row, the transaction is not allowed to read this record version anymore.
If the transaction id is lower than the value, then it’s up to the isolation level to decide if a record should be visible or not. For READ COMMITTED, the currently executing statement timestamp becomes the lower boundary for row visibility. For REPEATABLE READ or SERIALIZABLE, all reads are relative to the start timestamp of the currently running transaction.
To understand how UPDATE works in MVCC, consider the following diagram:
txid_current()
PostgreSQL functionpost
record, we can see two operations happening: a DELETE and an INSERT.I'm running an online workshop on the 25th of June about The Best Way to Map Entities with Java Persistence and Hibernate.If you enjoyed this article, I bet you are going to love my Book and Video Courses as well.
By allowing multiple versions of the same record, there is going to be less contention on reading/writing records since Readers will not block writers and Writers will not block Readers as well.
Although not as intuitive as 2PL (Two-Phase Locking), MVCC is not very difficult to understand either. However, it’s very important to understand how it works, especially since data anomalies are treated differently than when locking is being employed.