PostgreSQL comes with a plethora of configuration options, but changing the default settings of some of these options drastically improves the observability of your PostgreSQL server. You’ll want to set and configure these options before issues crop up in production, as they can provide information essential to understanding and resolving those issues.
Read on to learn more about the settings and extensions that expose metrics and information about the inner workings of your PostgreSQL server.
Log Line Prefix
The log_line_prefix configuration option determines what PostgreSQL writes at the beginning of each log line. The default depends on the specific Linux distribution or managed solution that you’re using, but more often than not it does not include a few items that can prove very useful in tracking down misbehaving clients. Try this log_line_prefix:
It includes the timestamp (%m), the PID of the backend process (%p), the application name (%a) of the client, the username which the client has connected with (%u), the database which the client has connected to (%d) and the hostname or IP where the connection is coming from (%h). This results in log lines like this:
which are much more useful than the default. You can see that a client connected from 172.17.0.1 as user postgres to database bench, and the application was psql. Definitely an improvement over the default option, which shows only this:
Logging Slow Queries
PostgreSQL can be configured to log queries that take more than a set amount of time to execute. These go into the same log file; there is no separate slow query log file as in MySQL.
To log statements that take more than 1 second to execute, use the log_min_duration_statement option like this:
Note that log_min_duration_statement will consider all statements (including for example, long-running admin statements like REINDEX TABLE) and not just queries (SELECT). Here are some log entries produced by this option:
If this results in too many logs of similar statements, you can tell Postgres to log only a percentage of it, using:
This logs only 25% of the statements that become eligible for logging (the ones that took more than 1 second to execute). The log output is the same as before. There is no way to know how many eligible statements were not logged.
To log all statements, along with the time taken to execute them, use the log_statement option instead:
The ‘mod’ option tells Postgres to log DDLs and data-modifying statements. This results in logs like these:
Be warned that it is not possible to sample statement logging enabled this way, all statements will be logged, and you’ll end up with tons of log entries.
Logging Locks and Deadlocks
Queries can wait too long to acquire a lock. Typically, an upper limit on how long to wait is set using the option lock_timeout, usually at the client side. If a query has been waiting this long to acquire a lock, Postgres will cancel the execution of this query and log an error:
Let’s say you want to set a lock timeout of 1 minute, but log queries that wait for locks for more than, say 30 seconds. You can do this using:
This will create logs like this:
The use of deadlock_timeout is not a typo: it is the value that the lock wait logging mechanism uses. Ideally, there should have been something like log_min_duration_lock_wait, but unfortunately, that is not the case.
In case of actual deadlocks, Postgres will abort the deadlocked transactions after deadlock_timeout duration, and will log the offending statements. No explicit configuration is necessary.
The autovacuum process kicks in when Postgres determines the data in a table has changed sufficiently to warrant a vacuum and analyze. To keep an eye on this process, enable the logging of autovacuum runs:
Here is a sample entry that was caused by excessive changes to one table:
Note that autovacuum will typically trigger an analyze after the vacuum, and this will also be logged.
These logs will help you figure out how best to tune the autovacuum parameters, and will help in investigating if and when autovacuum is not being as effective as you thought it would be.
Checkpointing is the process of pushing WAL-logged changes into the actual files that back tables. Ideally checkpoints should occur at regular-and-not-too-frequent intervals, as it is a CPU and disk intensive process. For various reasons, checkpoints are also forced to happen before the next scheduled time, and this results in reduced query performance.
To keep an eye on checkpoint frequency and efficiency, enable logging of checkpoints:
This tells PostgreSQL to log the following whenever a checkpoint occurs:
The first line contains the flags that backend passed to the checkpointer. You can see that the “force” caused a checkpoint even though there were no pending changes to checkpoint. If “immediate” had not been specified, the checkpointer would have checkpointed up to checkpoint_completion_target.
There are a couple of other settings that you can turn on in your PostgreSQL configuration that will help with diagnosing issues:
- track_io_timing - setting this to on lets you see the time spent in disk I/O for each query (combined with pg_stat_statements extension described below). See the docs about a caveat to turning this on, but should be safe on nearly any modern Linux. Seeing a query’s disk I/O cost is impossible without turning this on.
- track_commit_timestamp - setting this to on can be useful in debugging replication lags and other replication-related issues.
Query Statistics via pg_stat_statements
The extension pg_stat_statements is an essential accessory for any PostgreSQL deployment. It collects and records statistics for each query executed, and presents them as a view called “pg_stat_statements”. This is an extension, which means you have to install it explicitly in each database that you want data for, using the command:
Since the extension relies on a .so, you’ll need to load that using shared_preload_libraries:
This unfortunately requires a restart of the PostgreSQL server; so make sure you do this before going live.
If you have upgraded from a previous version of PostgreSQL, be sure to upgrade your pg_stat_statement extension also, using:
The pg_stat_statements extension does not log anything, it is used via querying the view of the same name. For more details, see the official documentation.
Query Execution Plans via auto_explain
auto_explain is another extension present in core PostgreSQL. It can log the execution plans of slow queries. It only needs to be added to shared_preload_libraries, and does not need to be installed as an extension. It also has a couple of other options that typically need to be set to non-default values:
The above logs the execution plan for any query that takes more than 1 second to complete. Here is a sample output:
To learn more about auto_explain, see the official docs.
The extensions pg_stat_statements and auto_explain are the only two widely-supported options that PostgreSQL has for query performance management and query plan management. It pays to get to know these two features, and to plan ahead on using them in production.
The application name is a client-side parameter, and can usually be set in DSNs or libpq-style connection strings that your application uses for connection information. Many tools and utilities in the PostgreSQL eco system understand application name, and it helps to set this to a meaningful value, for example:
This would be set for each client application that connects to your PostgreSQL server.
pgDash is a modern, in-depth monitoring solution designed specifically for PostgreSQL deployments. pgDash shows you information and metrics about every aspect of your PostgreSQL database server, collected using the open-source tool pgmetrics.
pgDash provides core reporting and visualization functionality, including collecting and displaying PostgreSQL information and providing time-series graphs, detailed reports, diagnostics, alerting, teams and more. Checkout the features here or signup today for a free trial.