Table partitioning has been evolving since the feature was added to PostgreSQL in version 10. Version 11 saw some vast improvements, as I mentioned in a previous blog post.
During the PostgreSQL 12 development cycle, there was a big focus on scaling partitioning to make it not only perform better, but perform better with a larger number of partitions. Here I’d like to talk about what has been improved.
Bulk loading data into a partitioned table using COPY is now able to make use of bulk-inserts. Previously only one row was inserted at a time.
The COPY speed does appear to slow with higher numbers of partitions, but in reality, it tails off with fewer rows per partition. In this test, as the partition count grows, the rows per partition shrinks. The reason for the slowdown is due to how the COPY code makes up to 1000 slots for each tuple, per partition. In the fewer partitions case, these slots are reused more often, hence performance is better. In reality, this performance tailing off is likely not to occur since you’re likely to have more than 12.2k rows per partition.
In PostgreSQL 11 when INSERTing records into a partitioned table, every partition was locked, no matter if it received a new record or not. With larger numbers of partitions and fewer rows per INSERT, the overhead of this could become significant.
In PostgreSQL 12, we now lock a partition just before the first time it receives a row. This means if we’re inserting just 1 row, then only 1 partition is locked. This results in much better performance at higher partition counts, especially when inserting just 1 row at a time. This change in the locking behaviour was also teamed up with a complete rewrite of the partition tuple routing code. This rewrite massively reduces the overhead of the setting up of the tuple routing data structures during executor startup.
You can see that the performance in PostgreSQL 12 is fairly consistent no matter how many partitions the partitioned table has.
Back in PostgreSQL 10, the query planner would check the constraint of each partition one-by-one to see if it could possibly be required for the query. This meant a per-partition overhead, resulting in planning times increasing with higher numbers of partitions. PostgreSQL 11 improved this by adding “partition pruning”, an algorithm which can much more quickly identify matching partitions. However, PostgreSQL 11 still did some unnecessary processing and still loaded meta-data for each partition, regardless of if it was pruned or not.
PostgreSQL 12 changes things so this meta-data loading is performed after partition pruning. This results in significant performance improvements in the query planner when many partitions are pruned.
The chart below shows the performance of a SELECT of a single row from a HASH partitioned table partitioned on a BIGINT column, which is also the PRIMARY KEY of the table. Here partition pruning is able to prune all but the one needed partition.
Once again it is fairly clear that PostgreSQL 12 improves things significantly here. Performance does tail off just a little bit still at the higher partition counts, but it’s still light years ahead of PostgreSQL 11 on this test.
Other Partitioning Performance Improvements:
Ordered partition scans:
The planner is now able to make use of the implicit order of LIST and RANGE partitioned tables. This allows the use of the Append operator in place of the MergeAppend operator when the required sort order is the order defined by the partition key. This is not possible for HASH partitioned tables since various out of order values can share the same partition. This optimization reduces useless sort comparisons and provides a good boost for queries that use a LIMIT clause.
Get rid of single sub-plan Append and MergeAppend nodes:
This is a fairly trivial change which eliminates the Append and MergeAppend nodes when the planner sees it’s only got a single sub-node. It was quite useless to keep the Append / MergeAppend node in this case as they’re meant to be for appending multiple subplan results together. There’s not much to do when there’s already just 1 subplan. Removing these does also give a small performance boost to queries as pulling tuples through executor nodes, no matter how trivial they are, is not free. This change also allows some queries to partitioned tables to be parallelized which previously couldn’t be.
Various performance improvements to run-time partition pruning:
A fair bit of optimization work was also done around run-time partition pruning to reduce executor startup overheads. Some work was also done to allow PostgreSQL to make use of Advanced Bit Manipulation instructions which gives PostgreSQL’s Bitmapset type a performance boost. This allows supporting processors to perform various operations 64-bits at a time in a native operation. Previously all these operations trawled through the Bitmapset 1 byte at a time. These Bitmapsets have also changed from 32-bits to 64-bits on 64-bit machines. This effectively doubles the performance of working with larger Bitmapsets.
Some changes were also made to the executor to allow range tables (for storing relation meta-data) to be found in O(1) rather than O(N) time, where N is the number of tables in the range table list. This is particularly useful as each partition in the plan has a range table entry, so looking up the range table data for each partition was costly when the plan contained many partitions.
With these improvements and using a RANGE partitioned table partitioned by a timestamp column, each partition storing 1 month of data, the performance looks like:
You can see that PostgreSQL 12’s gain gets bigger with more partitions. However, those bars taper off at higher partition counts. This is because I formed the query in a way that makes plan-time pruning impossible. The WHERE clause has a STABLE function, which the planner does not know the return value of, so cannot prune any partitions. The return value is evaluated during executor startup and run-time pruning takes care of the partition pruning. Unfortunately, this means the executor must lock all partitions in the plan, even the ones that are about to be run-time pruned. Since this query is fast to execute, the overhead of this locking really shows with higher partition counts. Improving that is going to have to wait for another release.
The good news is that if we change the WHERE clause swapping out the STABLE function call for a constant, the planner is able to take care of pruning:
The planning overhead shows here as with few partitions the performance of PostgreSQL 12 is not as high as with the generic plan and run-time pruning. With larger numbers of partitions, the performance does not tail off as much when the planner is able to perform the pruning. This is because the query plan has is only 1 partition for the executor to lock and unlock.
You can see from the graphs above that we’ve done a lot to improve partitioning in PostgreSQL 12. However, please don’t be too tempted by the graphs above and design all your partitioning strategies to include large numbers of partitions. Be aware that there are still cases where too many partitions can cause the query planner to use more RAM and become slow. When performance matters, and it generally always does, we highly recommend you run workload simulations. This should be done away from production server with various numbers of partitions to see how it affects your performance. Have a read of the best practices section of the documentation for further guidance.
All tests were run on an Amazon AWS m5d.large instance using pgbench. The transactions per seconds tests were measured over 60 seconds.
The following settings where changed:
shared_buffers = 1GB
work_mem = 256MB
checkpoint_timeout = 60min
max_wal_size = 10GB
max_locks_per_transaction = 256
All transactions per second counts were measured using a single PostgreSQL connection.