PostgreSQL replication: configuration, validation, basic types

Replication is a technology of copying, distributing, and synchronization data between the main or master server and the slave or replica server. It is a key technology for high-load infrastructures, which enables them to:

➡️ Acquire fault tolerance. Restoring from a backup can take up to several hours, and the backup may not be up to date/faulty. However, database replication will allow you to be sure of workflow continuity.

➡️ Gain performance. It is like horizontal scaling. The load on databases is distributed between servers. Reporting and backup can be done from the backup version of the database, which will relieve the main one.

From this article, you will learn how to configure replication in PostgreSQL DBMS starting from version 10.0.

Types of PostgreSQL replication

There are two main PostgreSQL replication types: streaming (or physical) and logical.

1. Streaming or physical replication

All data about changes on the master server is streaming to the slave server via a transaction log (WAL). This is simple and cost-effective, but the requirements for server compatibility are high.

There are synchronous and asynchronous PostgreSQL replication. These types of streaming replication in PostgreSQL differ in the order of query execution:

● Asynchronous replication. In asynchronous replication, requests first get to the main server, then – to the WAL, and only at the end – to the additional server. The advantage of such replication is fast transaction confirmation. The disadvantage is the danger of losing some data on replicas because there will not be enough time to fully duplicate data if something happens to the master.

● Synchronous replication. In synchronous replication, requests are first sent to the WAL, and from there to the master and slave servers. Pluses – greater data security, minuses – low speed of operations because the transaction is confirmed only after full duplication on the slave server.

2. Logical replication

PostgreSQL logical replication allows synchronizing database copies based on replication identifiers (records in database tables, not pure bits and bytes). For this type of replication, it does not matter whether or not the server configurations, DBMS versions, and platforms used are the same.

Logical replication was introduced in PostgreSQL version 10.0. It allows selectively synchronizing the necessary parts of the database on different slave servers, which is not available with physical replication. It works according to the subscription principle: the master server acts as a publisher publishing changes, and backup servers subscribe to some of them.

Configuring PostgreSQL physical replication

If different physical servers are used, make sure that the servers’ processors have the same architecture, run the same operating systems, and the PostgreSQL versions installed on them are identical (for logical replication, this is not crucial).

However, more often we are talking about virtual servers. If you are using a cloud platform, create two servers in the control panel and name them Master and Slave to avoid confusion. Then set up the OS, configuration, RAM, and disk space identically for both servers, for example – Ubuntu 22.04, 2 vCPUs, 8GB RAM and 10GB disk.

Pay attention to the following points:

➡️ It is important to remember to configure access to the VM from the Internet. Make sure you either set up root access by password or generate an SSH key to connect to the machine.

➡️ Since the VMs in our example are on the same network, private addresses are used. Otherwise, you would need to use a public network, configure a firewall, and open ports. Postgre servers use TCP port 5432. When using a firewall, to open the port you need to run one of the commands that differ depending on which Linux command line utility you are using: Firewalld, Iptables, or UFW.

➡️ By default, Linux distributions such as Red Hat, CentOS, and Fedora have SELinux security enabled: you will need to disable it.

Configuring and preparing the master server

At this stage, you need to give the slave server access to connections on the master server. To do this, edit the config /etc/postgresql/12/main/pg_hba.conf. (the number “12” in the path means the PostgreSQL version number).

After the line “If you want to allow non-local connections, you need to add more” you need to add “host replication postgres REPLICA_INTERNAL_IP/32 md5” to the file.

Then you will need to configure the replication itself. To do this, open the config nano/etc/postgresql/10/main/postgresql.conf and set the necessary parameter values.

listen_addresses = ‘localhost, MASTER_INTERNAL_IP’

wal_level = hot_standby

archive_mode = on

archive_command = ‘cd .’

max_wal_senders = 8

hot_standby = on

After that, restart the server with the command “service postgresql restart” so that they are applied and go to the slave server terminal.

Configuring the replica server

Before configuring, stop the PostgreSQL server with the command “service postgresql stop”.

Then on the additional PostgreSQL server, edit pg_hba.conf. in the same way as on the master server, replacing only the IP address with the master one.

You should also edit the postgresql.conf file, specifying the same settings as on the master, except for the IP address.

In order for the slave server to start replication, transfer the database backup from the master server to it using the pg_basebackup utility, and then start the PostgreSQL server again. To do this, you need to log in using the “su – postgres” command.

Now go to the partition with the database: cd /var/lib/postgresql/12/ and recreate the directory: delete the default one and create a new but empty one with the same name: rm -rf main; mkdir main; chmod go-rwx main .

Next, unload the database from the master server with the command “pg_basebackup -P -R -X stream -c fast -h MASTER_ENTER_IP -U postgres -D ./main”. The “-R” parameter in it means that an empty file is also created, indicating that this server is a replica. Starting with version 12 of PostgreSQL, discussed in this article, this file is called standby.signal, and in earlier versions it was called recovery.conf.

It remains to start the PostgreSQL server back with the command “service postgresql start”. That’s it, replication is configured.

Replication verification

PostgreSQL replication verification is performed in several steps.

1. To check if replication works, create a new table on the master server. See if it is displayed in the backup server terminal.

2. Then create a table on the slave server. If everything is configured correctly, it will not appear on the master server.

3. To simulate a master server crash, disconnect the master server from the KVM console. If everything is configured correctly, you can now put the slave server in write mode, and create, and edit the database directly on it.

Note that if services and applications are connected to the disconnected master server directly, then when roles are changed, queries from them will not be automatically directed to the new server. To solve this problem, load balancers are used that accept requests from applications instead of PostgreSQL and monitor the state of the servers to automatically redistribute the load between different database instances.

Frequent replication errors

If replication does not work after doing all of these steps, you should open the logs on the slave server and review them for errors. There are several common failure scenarios:

➡️ Incoming connections were cut off by the firewall.

➡️ Replication IDs did not match on the master and slave server.

➡️ The required table was not added to the publication on the master server.

➡️ The replication was not labeled as “logical” in the DBMS.

Fixing these problems should cause replication to start automatically. If it does not, you should try to delete the failed database part subscription with the DROP SUBSCRIPTION my_subscription command and recreate it.

However, PostgreSQL replication is a serious task that is best left to an experienced professional. We at «DB-Service» are ready to take PostgreSQL database administration and support ourselves. We will ensure your database management system availability and smooth operation.

PostgreSQL Server Replication