I find myself discussing and explaining sychronous_commit with many PostgreSQL users, especially with novice users. So, I thought of noting down all the key points as a blog post which will be useful for more users. Recently I got an opportunity to talk about some related topics in our PostgreSQL Percona Tech Days.
What Is sychronous_commit All About?
This is the parameter by which we can decide when a transaction-commit can be acknowledged back to the client as successful.
So this parameter is not just about synchronous standbys, but it has a wider meaning and implication which is useful for standalone PostgreSQL instances as well. To better understand, we should look at the overall WAL record propagation and various stages from which a commit confirmation is acceptable. This allows us to opt for varying levels of durability for each transaction. The lesser the durability selection, the faster the acknowledgment, which improves the overall throughput and performance of the system.
PostgreSQL WAL (Write Ahead Log) is the record of changes/activities on the Primary side and can be considered as a journal/ledger of the changes happening in the database. The following diagram shows the flow of WAL propagation in a local primary PostgreSQL instance and a remote hot standby instance.
The above diagram shows all the 5 major stages.
- WAL Record Inserts (local): WAL records are first created in WAL buffers. Since multiple backend processes will be creating the WAL records at a time, it is properly protected by locks. The writing of WAL records in wal_buffers is gets continuously written to WAL segments by different background processes. If the sychronous_commit is completely off, the flush won’t be happening immediately but relies on wal_writer_delay settings, which we discuss in the below section.
- WAL Writes and WAL Flush (local): This flush to WAL “segment files” on the local disk is considered as one of the heavy operations. PostgreSQL has done a lot of optimization in this area to avoid frequent flushes.
- Remote Write: WAL records are written to remote standbys (but not yet flushed). The data may remain on page cache for some time. Unless we want to address the case of both Primary and Standby instances crashing at the same time, this level of durability protection can be considered.
- Remote Flush: At this stage, the data is really written and flushed to disk at the remote standby side. So we have the guarantee that data is available at the standby side, even if it also crashes.
- Remote Apply: In this stage, the WAL records are replayed at the remote/standby side and it is available to sessions running there.
Corresponding accepted values for synchronous_commit are as follows:
- off: You may use values off, 0 (zero), false, or no to turn off the synchronous_commit. As the name indicates, the commit acknowledgment can come before flushing the records to disk. This is generally called as an asynchronous commit. If the PostgreSQL instance crashes, the last few asynchronous commits might be lost.
- local: WAL records are written and flushed to local disks. In this case, the commit will be acknowledged after the local WAL Write and WAL flush completes.
- remote_write: WAL records are successfully handed over to remote instances which acknowledged back about the write (not flush).
- on: This is the default value and you may use values on, true, yes, or 1 to set the value to “on”. But the meaning may change based on whether you have a synchronous standby or not. If there is a synchronous standby, setting the value to on will result in waiting till “remote flush”.
- remote_apply: This will result in commits waiting until replies from the current synchronous standby(s) indicate they have received the commit record of the transaction and applied it so that it has become visible to queries on the standby(s).
By selecting the appropriate values for the parameter, we can choose when the acknowledgment should come back. If there are no synchronous standbys ( synchronous_standby_names is empty) the settings of synchronous_commit to on, remote_apply, remote_write, and local all provide the same synchronization level: transaction commits only wait for local flush to disk.
One of the frequently asked questions in this area is:
“How much data we lose if we opt for full asynchronous commit (synchronous_commit = off)?”
The answer is slightly complex, and it depends on wal_writer_delay settings. By default it is 200ms. That means WALs will be flushed in every wal_writer_delay to disk. The WAL writer periodically wakes up and calls XLogBackgroundFlush(). This checks for completely filled WAL pages. If they are available, it writes all the buffers up to that point. So under good load condition, WAL writer writes whole buffers. On a low-load condition where full pages are not found, everything up to last asynchronous commit will be flushed.
If more than wal_writer_delay has passed, or more than wal_writer_flush_after blocks have been written since the last flush, WAL is flushed up to the current location. This arrangement guarantees that an async commit record reaches disk after at most two times wal_writer_delay after the transaction completes. However, PostgreSQL writes/flushes full buffers in a flexible way, and this is to reduce the number of writes issued under high load when multiple WAL pages are filled per WAL writer cycle. In concept, this makes the worst-case delay up to three wal_writer_delay cycles.
So the answer, in a simple format, is:
The loss will be less than two times the wal_writer_delay in most cases. But it can be up to three times in a worst-case.
Scope of Settings
When we discuss parameters and their values, many users think about setting sychronous_commit globally at the instance level. But the real power and usage come when it is scoped properly at different levels. Changing it at the instance level is not desirable.
PostgreSQL allows us to have this setting at varied scope in addition to various values.
- At each transaction level In a perfectly tunned application design, specific transaction can opt-in for specific sychronous_commit levels for each transaction, for example:SET LOCAL synchronous_commit = 'remote_write';
Please note the “LOCAL” specification. Once the transaction block completes (commit or rollback) the setting will be back to what is applicable at the session-level. This allows the architect to opt-in for extra overhead for specific critical transactions.
- At session level The setting can be specified at each session-level so that it is applicable across the session unless overridden by the transaction level setting mentioned above.SET synchronous_commit = 'remote_write';
Additionally one might choose to pass this to PostgreSQL as part of the connection string options from the application connections. for example: "host=hostname user=postgres ... options='-c synchronous_commit=off'". So the requirement of any code modification can be reduced.
- At user level In an ideal system, with well-managed user accounts, each user account will be dealing with specific functionality. It can range from the critical transaction system to the reporting user accounts. For example:ALTER USER trans_user SET synchronous_commit= ON;ALTER USER report_user SET synchronous_commit=OFF;
The sessions created by these users will have these settings by default. This user-level setting can be overridden at the session level or transaction level.
- At database level Specifying at the database level is useful when we have dedicated systems for reporting or temporary staging information.ALTER DATABASE reporting SET synchronous_commit=OFF;
- At instance level This is at the PostgreSQL instance level as follows:ALTER SYSTEM SET synchronous_commit=OFF;
- At each transaction level In a perfectly tunned application design, specific transaction can opt-in for specific sychronous_commit levels for each transaction, for example:
Common Use Cases
Migration: It is very common to see large data movement across systems when there are migrations happening, and the right selection of synchronous_commit or even turning it off will be of great value for reducing the overall migration time.
Data loading: In data warehouse systems/reporting systems, there could be large data load happening, and turning off the synchronous_commit will give a big boost by reducing the overhead of repeated flushes.
Audit and logging: Even in a critical system with critical transactions, only a certain portion of the transaction can be very critical – which a business wants to be available – on the standby side before acknowledging the commit. But there will be associated logging and auditing information recording. Very selective opt-in for synchronous standby commits can yield very high benefits.
A Final Note
A quick test using pgbench could help to verify the overhead of different levels of commit synchronization in a specific environment. Overall, we should expect a performance drop as we increase the level of synchronization requirement. Environmental factors like latency in fsync to local disk, network latency, load on the standby server, contention at the standby, overall replication volume, disk performance at standby server, etc., will be affecting the overhead and performance.
Even completely turning off the synchronous_commit won’t result in database corruption, unlike fsync.
Understanding the overall WAL propagation can help you to understand the replication delays and information from pg_stat_replication view.
The performance of a system is all about eliminating unwanted, avoidable overheads without sacrificing what is really important. I have seen a few power users of PostgreSQL who have a well-tuned system for the PostgreSQL database by making use of synchronous_commit features very effectively and selectively. I hope this article will help those who are still not using it and looking at fine-tuning opportunities for higher performance or durability
Our white paper “Why Choose PostgreSQL?” looks at the features and benefits of PostgreSQL and presents some practical usage examples. We also examine how PostgreSQL can be useful for companies looking to migrate from Oracle.