My Favorite PostgreSQL Extensions - Part One


This is in continuation of my previous blog entry wherein I had touched upon a topic of PostgreSQL  Extensions. PostgreSQL Extensions are a plug and play set of enhancements that add an extra feature-set to a PostgreSQL cluster. Some of these features are as simple as reading or writing to an external database while others could be a sophisticated solution to implement database replication, monitoring, etc.

PostgreSQL has evolved over the years from a simple open source ORDBMS to a powerful database system with over 30 years of active development offering reliability,  performance, and all ACID compliant features. With PostgreSQL 12 released a few months ago,  this database software is only getting bigger, better, and faster. 

Occasionally, extensions needed to be added to a PostgreSQL cluster to achieve enhanced functionality that were unavailable in the native code, because they were either not developed due to time constraints or due to insufficient evidence of edge case database problems. I am going to discuss a few of my favourite extensions in no particular order, with some demos that are used by developers and DBAs. 

Some of these extensions may require to be included in the shared_preload_libraries server parameter as a comma separated list to be preloaded at the server start. Although most of the extensions are included in the contrib  module of source code,  some have to be downloaded from an external website dedicated only to PostgreSQL extensions called the PostgreSQL Extension Network

In this two part blog series we will discuss extensions used to access data (postgres_fwd) and shrink or archive databases (pg_partman). Additional extensions will be discussed in the second part.

postgres_fdw

The postgres_fdw is a foreign data wrapper extension that can be used to access data stored in external PostgreSQL servers. This extension is similar to an older extension called dblink but it differs from its predecessor by offering standards-compliant syntax and better performance. 

The important components of postgres_fdw are a server, a user mapping, and a foreign table. There is a minor overhead added to the actual cost of executing queries against remote servers which is the communication overhead. The postgres_fdw extension is also capable of communicating with a remote server having a version all the way up to PostgreSQL 8.3, thus being backward compatible with earlier versions.

Demo

The demo will exhibit a connection from PostgreSQL 12 to a PostgreSQL 11 database.  The pg_hba.conf settings have already been configured for the servers to talk to each other. The extensions control files have to be loaded into the PostgreSQL shared home directory before creating the extension from Inside a  PostgreSQL cluster. 

Remote Server:

$ /usr/local/pgsql-11.3/bin/psql -p 5432 -d db_replica postgres psql (11.3) Type "help" for help. db_replica=# create table t1 (sno integer, emp_id text); CREATE TABLE db_replica=# \dt t1 List of relations Schema | Name | Type | Owner --------+------+-------+---------- public | t1 | table | postgres db_replica=# insert into t1 values (1, 'emp_one'); INSERT 0 1 db_replica=# select * from t1; sno | emp_id -----+--------- 1 | emp_one (1 row)

Source Server:

$ /database/pgsql-12.0/bin/psql -p 5732 postgres psql (12.0) Type "help" for help. postgres=# CREATE EXTENSION postgres_fdw; CREATE EXTENSION postgres=# CREATE SERVER remote_server postgres-# FOREIGN DATA WRAPPER postgres_fdw postgres-# OPTIONS (host '192.168.1.107', port '5432', dbname 'db_replica'); CREATE SERVER postgres=# CREATE USER MAPPING FOR postgres postgres-# SERVER remote_server postgres-# OPTIONS (user 'postgres', password 'admin123'); CREATE USER MAPPING postgres=# CREATE FOREIGN TABLE remote_t1 postgres-# (sno integer, emp_id text) postgres-# server remote_server postgres-# options (schema_name 'public', table_name 't1'); CREATE FOREIGN TABLE postgres=# select * from remote_t1; sno | emp_id -----+--------- 1 | emp_one (1 row) postgres=# insert into remote_t1 values (2,'emp_two'); INSERT 0 1 postgres=# select * from remote_t1; sno | emp_id -----+--------- 1 | emp_one 2 | emp_two (2 rows)

The WRITE operation from the source server reflects the remote server table immediately. A similar extension called oracle_fdw also exists which enables READ and WRITE access between PostgreSQL and Oracle tables. In addition to that, there is another extension called file_fdw which enables data access from flat files on disk. Please refer to the official documentation of postgres_fdw published here, for more information and details.

pg_partman

As databases and tables grow, there is always a need to shrink databases, archive data that is not needed or at least partition tables into various smaller fragments. This is so the query optimizer only visits the parts of the table that satisfy query conditions, instead of scanning the whole heap of tables. 

PostgreSQL has been offering partitioning features for a long time including Range, List, Hash, and Sub-partitioning techniques. However, it requires a lot of administration and management efforts such as defining child tables that inherit properties of a parent table to become its partitions, creating trigger functions to redirect data into a partition and further create triggers to call those functions, etc. This is where pg_partman comes into play, wherein all of these hassles are taken care of automatically.

Demo

I will show a quick demo of setting things up and inserting sample data. You will see how the data inserted into the main table gets automatically redirected to the partitions by just setting up pg_partman. It is important for the partition key column to be not null.

db_replica=# show shared_preload_libraries; shared_preload_libraries -------------------------- pg_partman_bgw (1 row) db_replica=# CREATE SCHEMA partman; CREATE SCHEMA db_replica=# CREATE EXTENSION pg_partman SCHEMA partman; CREATE EXTENSION db_replica=# CREATE ROLE partman WITH LOGIN; CREATE ROLE db_replica=# GRANT ALL ON SCHEMA partman TO partman; GRANT db_replica=# GRANT ALL ON ALL TABLES IN SCHEMA partman TO partman; GRANT db_replica=# GRANT EXECUTE ON ALL FUNCTIONS IN SCHEMA partman TO partman; GRANT db_replica=# GRANT EXECUTE ON ALL PROCEDURES IN SCHEMA partman TO partman; GRANT db_replica=# GRANT ALL ON SCHEMA PUBLIC TO partman; GRANT db_replica=# create table t1 (sno integer, emp_id varchar, date_of_join date not null); db_replica=# \d List of relations Schema | Name | Type | Owner --------+------+-------+---------- public | t1 | table | postgres (1 row) db_replica=# \d t1 Table "public.t1" Column | Type | Collation | Nullable | Default --------------+-------------------+-----------+----------+--------- sno | integer | | | emp_id | character varying | | | date_of_join | date | | not null | db_replica=# SELECT partman.create_parent('public.t1', 'date_of_join', 'partman', 'yearly'); create_parent --------------- t (1 row) db_replica=# \d+ t1 Table "public.t1" Column | Type | Collation | Nullable | Default | Storage | Stats target | Description --------------+-------------------+-----------+----------+---------+----------+--------------+------------- sno | integer | | | | plain | | emp_id | character varying | | | | extended | | date_of_join | date | | not null | | plain | | Triggers: t1_part_trig BEFORE INSERT ON t1 FOR EACH ROW EXECUTE PROCEDURE t1_part_trig_func() Child tables: t1_p2015, t1_p2016, t1_p2017, t1_p2018, t1_p2019, t1_p2020, t1_p2021, t1_p2022, t1_p2023 db_replica=# select * from t1; sno | emp_id | date_of_join -----+--------+-------------- (0 rows) db_replica=# select * from t1_p2019; sno | emp_id | date_of_join -----+--------+-------------- (0 rows) db_replica=# select * from t1_p2020; sno | emp_id | date_of_join -----+--------+-------------- (0 rows) db_replica=# insert into t1 values (1,'emp_one','01-06-2019'); INSERT 0 0 db_replica=# insert into t1 values (2,'emp_two','01-06-2020'); INSERT 0 0 db_replica=# select * from t1; sno | emp_id | date_of_join -----+---------+-------------- 1 | emp_one | 2019-01-06 2 | emp_two | 2020-01-06 (2 rows) db_replica=# select * from t1_p2019; sno | emp_id | date_of_join -----+---------+-------------- 1 | emp_one | 2019-01-06 (1 row) db_replica=# select * from t1_p2020; sno | emp_id | date_of_join -----+---------+-------------- 2 | emp_two | 2020-01-06 (1 row)

This is a simple partitioning technique but each of the above simple partitions can be further divided into sub-partitions. Please check the official documentation of pg_partman published here, for more features and functions it offers.

Conclusion

Part two of this blog will discuss other PostgreSQL extensions like pgAudit, pg_repack and HypoPG.