PostgreSQL 12 has been considered as a major update consisting of major performance boost with partitioning enhancements, indexing improvements, optimized planner logics and several others. One of the major changes is noticeably the removal of
recovery.conf in a standby cluster. For this reason, the procedure to set up a streaming replication clusters has changed, and in this blog, I will demonstrate how to properly setup a streaming replication setup in PG12.
Streaming replication setup requires a master cluster and one or more slave clusters that will replicate the data inserted to the master by streaming the archived WAL files generated by master. The master and slaves can reside on different machines connected via network but in this blog, we will use one master and one slave setup and both will be run on the same machine with different port number.
The procedures illustrated in this blog is based on Postgres version 12 built from source running on Ubuntu 18.04
2. Master Database Cluster Setup
Create a master database cluster using initdb tool:
$ initdb /home/caryh/streaming-replication/db-master $ cd /home/caryh/streaming-replication
/home/caryh/streaming-replication is the root folder to all the database clusters that we will be creating in this blog and db-master directory will be created here as a result of above commands. Let’s modify the default postgreql.conf and enable several important configuration options as shown below for streaming replication setup.
############# db-master/postgresql.conf ############# wal_level = replica archive_mode = on max_wal_senders = 10 wal_keep_segments = 10 hot_standby = on archive_command = 'test ! -f /home/caryh/streaming-replication/archivedir/%f && cp %p /home/caryh/streaming-replication/archivedir/%f' port = 5432 wal_log_hints = on
The configuration above enables Postgres to archive the WAL files in the directory /home/caryh/streaming-replication/archivedir/ when it has completed writing to a full block of WAL file or when pg_basebackup command has been issued. The %f and %p used within
archive_command are internal to Postgres and %f will be replaced with the filename of the target WAL file and %p replaced with path to the targeted WAL file.
It is very important when setting the
archive_command to ensure the WAL files are archived to a location where the slave cluster can access.
Please note that
wal_log_hints must be enabled for pg_rewind tool to work properly. We will discuss more about pg_rewind in the next blog post.
Examine the client authentication file
db-master/pg_hba.conf and make sure the master cluster allows replication connections from a slave cluster remotely. In my case, both my master and slave will be run on the same host, so I will leave the loopback IP address as it is. If your slave cluster is located in another machine, make sure to replace the loopback address with the right one.
############# db-master/pg_hba.conf ############# # Allow replication connections from 127.0.0.1, by a user with the replication privilege. # TYPE DATABASE USER ADDRESS METHOD host replication all 127.0.0.1/32 trust
Let’s go ahead and start the master database cluster with the above configuration files, create a super user with permission to do replication, and a database called clusterdb
$ pg_ctl -D db-master start $ createuser cary -s --replication $ createdb clusterdb
Insert some test data to the master cluster. For simplicity, we will insert 100 integers to
$ psql -d clusterdb -U cary -c "CREATE TABLE test_table(x integer)" CREATE TABLE $ psql -d clusterdb -U cary -c "INSERT INTO test_table(x) SELECT y FROM generate_series(1, 100) a(y)" INSERT 0 100 $ psql -d clusterdb -U cary -c "SELECT count(*) from test_table" count ------- 100 (1 row)
3. Slave Database Cluster Setup
The goal of setting up the slave cluster is to make a backup of the current master and set it up as a standby server, meaning it will stream the WAL file updates from the master and perform replication of the data.
Postgres provides several tools and methods to perform physical database backup. Exclusive methods such as
pg_stop_backup() are quite common in earlier Postgres versions. In this blog, we will use the newer, and simpler non-exclusive
pg_basebackup fronend tool to execute the backup. There are advantages and disadvantaged for both methods and this discussion is not within the scope of this blog. This article here provides very good explaination on both methods:
Let’s use pg_basebackup to create the slave cluster.
$ pg_basebackup -h 127.0.0.1 -U cary -p 5432 -D db-slave -P -Xs -R 31373/31373 kB (100%), 1/1 tablespace
where: -h is the IP of the master cluster -U is the username that is permitted to do replication -p is the port number of the running master cluster -D is the directory where we want to set up the slave database cluster -P to show the progress -Xs to select WAL streaming method
-R to write a recovery.conf file.
This step is where it would differ from the previous PG versions. The -R command will no longer output a recovery.conf file in the db-slave directory.
$ ls db-slave backup_label pg_dynshmem pg_multixact pg_snapshots pg_tblspc pg_xact base pg_hba.conf pg_notify pg_stat pg_twophase postgresql.auto.conf global pg_ident.conf pg_replslot pg_stat_tmp PG_VERSION postgresql.conf pg_commit_ts pg_logical pg_serial pg_subtrans pg_wal standby.signal
The contents of the old recovery.conf file are moved to
db-slave/postgresql.auto.conf first, and we will see that pg_basebackup already created the
primary_conninfo for us. This line used to be located in
recovery.conf and it tells where and how a slave cluster should stream from the master cluster. Make sure this line is present in the
############# db-slave/postgresql.auto.conf ############# # Do not edit this file manually! # It will be overwritten by the ALTER SYSTEM command. primary_conninfo = 'user=cary passfile=''/home/caryh/.pgpass'' host=127.0.0.1 port=5432 sslmode=prefer sslcompression=0 gssencmode=disable target_session_attrs=any'
db-slave/postgresql.conf and update some of the parameters.
############# db-slave/postgresql.conf ############# wal_level = replica archive_mode = on max_wal_senders = 10 wal_keep_segments = 10 hot_standby = on archive_command = 'test ! -f /home/caryh/streaming-replication/archivedir/%f && cp %p /home/caryh/streaming-replication/archivedir/%f' wal_log_hints = on port = 5433 restore_command = 'cp /home/caryh/streaming-replication/archivedir/%f %p' archive_cleanup_command = 'pg_archivecleanup /home/caryh/streaming-replication/archivedir %r'
db-slave/postgresql.conf is directly copied from master cluster via pg_basebackup, we will need to change the
port to some port different (5433 in this case) from the master since both are running on the same machine. We will need to fill the
archive_cleanup_command so the slave cluster knows how to get the archived WAL files for streaming purposes. These two parameters used to be defined in
recovery.conf and are moved to
postgresql.conf in PG12.
In the db-slave directory, please note that a new
standby.signal file is created automatically by
pg_basebackup to indicate that this slave cluster will be run in
standby mode. The
standby.signal file is a new addition in PG12 to replace
standby_mode = 'on' that used to be defined in
recovery.conf. If this file is not present, make sure it is created by:
$ touch db-slave/standby.signal
Now, let’s start the slave cluster:
$ pg_ctl -D db-slave start
4. Verify the Streaming Replication Setup
Once both master and slave clusters are setup and running, we should see from the
ps -ef command that some of the backend processes are started to achieve the replication, namely,
$ ps -ef | grep postgres caryh 12782 2921 0 16:12 ? 00:00:00 /usr/local/pgsql/bin/postgres -D db-master caryh 12784 12782 0 16:12 ? 00:00:00 postgres: checkpointer caryh 12785 12782 0 16:12 ? 00:00:00 postgres: background writer caryh 12786 12782 0 16:12 ? 00:00:00 postgres: walwriter caryh 12787 12782 0 16:12 ? 00:00:00 postgres: autovacuum launcher caryh 12788 12782 0 16:12 ? 00:00:00 postgres: archiver last was 000000010000000000000002.00000028.backup caryh 12789 12782 0 16:12 ? 00:00:00 postgres: stats collector caryh 12790 12782 0 16:12 ? 00:00:00 postgres: logical replication launcher caryh 15702 2921 0 17:06 ? 00:00:00 /usr/local/pgsql/bin/postgres -D db-slave caryh 15703 15702 0 17:06 ? 00:00:00 postgres: startup recovering 000000010000000000000003 caryh 15708 15702 0 17:06 ? 00:00:00 postgres: checkpointer caryh 15709 15702 0 17:06 ? 00:00:00 postgres: background writer caryh 15711 15702 0 17:06 ? 00:00:00 postgres: stats collector caryh 15713 15702 0 17:06 ? 00:00:00 postgres: walreceiver streaming 0/3000148 caryh 15714 12782 0 17:06 ? 00:00:00 postgres: walsender cary 127.0.0.1(59088) streaming 0/3000148 caryh 15728 10962 0 17:06 pts/5 00:00:00 grep --color=auto post
We can also check the replication status in details by issuing a query to the master cluster:
$ psql -d clusterdb -U cary -c "select * from pg_stat_replication;" -x -p 5432 -[ RECORD 1 ]----+------------------------------ pid | 15714 usesysid | 16384 usename | cary application_name | walreceiver client_addr | 127.0.0.1 client_hostname | client_port | 59088 backend_start | 2019-10-29 17:06:49.072082-07 backend_xmin | state | streaming sent_lsn | 0/3000148 write_lsn | 0/3000148 flush_lsn | 0/3000148 replay_lsn | 0/3000148 write_lag | flush_lag | replay_lag | sync_priority | 0 sync_state | async reply_time | 2019-10-29 17:10:09.515563-07
Lastly, we can insert additional data to the master cluster and verify that slave also has the data updated.
# Query slave cluster $ psql -d clusterdb -U cary -c "SELECT count(*) from test_table" -p 5433 count ------- 100 (1 row) # Query master cluster $ psql -d clusterdb -U cary -c "SELECT count(*) from test_table" -p 5432 count ------- 100 (1 row) # Insert more data to master cluster $ psql -d clusterdb -U cary -c "INSERT INTO test_table(x) SELECT y FROM generate_series(1, 100) a(y)" -p 5432 INSERT 0 100 # Query slave cluster again psql -d clusterdb -U cary -c "SELECT count(*) from test_table" -p 5433 count ------- 200 (1 row)
Both master and slave clusters are now in sync.
5. Setup Replication Slots
The previous steps illustrate how to correctly setup streaming replication between a master and slave cluster. However, there may be a case where the slave can be disconnected for some reason for extended period of time and may fail to replicate with the master when some of the un-replicated WAL files are recycled or deleted from the master cluster controlled by
Replication slots ensure master can retain enough WAL segments for all slaves to receive them and prevent the master from removing rows that could cause a recovery conflict on the slaves.
Let’s create a replication slot on the master cluster called
$ psql -d clusterdb -U cary -c "select * from pg_create_physical_replication_slot('slave')" -p 5432 slot_name | lsn -----------+----- slave | (1 row) $ psql -d clusterdb -U cary -c "select * from pg_replication_slots" -x -p 5432 -[ RECORD 1 ]-------+--------- slot_name | slave plugin | slot_type | physical datoid | database | temporary | f active | f active_pid | xmin | catalog_xmin | restart_lsn | confirmed_flush_lsn |
We have just created replication slot on master called
slave and it is currently not active (active = f).
Let’s modify slave’s
postgresql.conf and make it connect to the master’s replication slot
############# db-slave/postgresql.conf ############# primary_slot_name = 'slave'
Please note that this argument
primary_slot_name us also used to be defined in
recovery.conf and moved to
postgresql.conf in PG12. After the change, we are required to restart the slave.
$ pg_ctl -D db-slave stop $ pg_ctl -D db-slave start
If all is good, checking the replication slots on master should have the slot status as active.
$ psql -d clusterdb -U cary -c "select * from pg_replication_slots" -x -p 5432 -[ RECORD 1 ]-------+---------- slot_name | slave plugin | slot_type | physical datoid | database | temporary | f active | t active_pid | 16652 xmin | catalog_xmin | restart_lsn | 0/3003B98 confirmed_flush_lsn |
In this blog, we have discussed the updated procedures to setup streaming replication clusters in PG12, in which several steps have been changed from the older versions, particularly the removal of
Here is a short list of changes related to replication setup that have been moved from
- restore_command => moved to
- recovery_target_timeline => moved to
- standby_mode => replaced by
- primary_conninfo => moved to
- archive_cleanup_command => moved to
- primary_slot_name => moved to
A multi-disciplined software developer specialised in C/C++ Software development, network security, embedded software, firewall, and IT infrastructure