© Laurenz Albe 2022
For detailed query performance analysis, you need
EXPLAIN (ANALYZE, BUFFERS) output for an SQL statement. With a parameterized statement, it can be difficult to construct a run-able statement for
EXPLAIN (ANALYZE). Sometimes, you don’t even know the parameter values. I’ll show you how you can get at least plain
EXPLAIN output for the generic plan in this case. Such an execution plan is better than nothing, and maybe even enough to guess what the problem is.
With the extended query protocol, PostgreSQL allows you to separate the SQL statement and the constants used in the statement. This enhances security, since it makes SQL injection impossible, but it is mostly a performance feature. Such parameterized statements can be named and reused with different parameter values, thereby saving the database engine the work of parsing the same statement over and over. What is more, using this feature PostgreSQL can sometimes avoid the much greater overhead of generating execution plans for each execution of the statement.
You typically encounter parameterized statements in two situations:
- prepared statements with parameters (typically used through a client API)
- static SQL statements in PL/pgSQL functions that use variables
The placeholders for the parameters are
$2 and so on. In the case of PL/pgSQL, you won’t see them; the PL/pgSQL call handler will substitute these placeholders instead of the variable names. Note that parameterized statements can only be
Generic plans for parameterized statements
Normally, PostgreSQL generates an execution plan whenever a query is executed. However, PostgreSQL can cache execution plans for named parameterized statements. Plans are only ever cached inside a single database session; there is no shared memory plan cache.
If PostgreSQL thinks it can do so without hurting performance, it will start using a generic plan for the statement. Such an execution plan does not take the parameter values into account and can be reused. The decision whether to switch to a generic plan or not is based on a heuristic and is typically made when the statement executes for the sixth time. You can tell such a plan from the placeholders that it uses instead of the parameters. Here is an example using a prepared statement:
PREPARE stmt(text) AS SELECT oid FROM pg_class WHERE relname = $1; EXPLAIN (COSTS OFF) EXECUTE stmt('pg_proc'); QUERY PLAN ═════════════════════════════════════════════════════════ Index Scan using pg_class_relname_nsp_index on pg_class Index Cond: (relname = 'pg_proc'::text) (2 rows)
The next four executions of
EXPLAIN look the same, but then we see
EXPLAIN (COSTS OFF) EXECUTE stmt('pg_attribute'); QUERY PLAN ═════════════════════════════════════════════════════════ Index Scan using pg_class_relname_nsp_index on pg_class Index Cond: (relname = $1) (2 rows)
PostgreSQL has started using a generic plan! From that point on, the query planning time will become much shorter.
Forcing a generic plan for a parameterized statement
You can use the PostgreSQL parameter
plan_cache_mode to influence the behavior described in the previous section. The default setting “
auto” chooses the heuristic described above, where PostgreSQL decides after a few executions whether a generic plan would be beneficial or not.
With the setting “
force_custom_plan” you can tell PostgreSQL to never use a generic plan. That is a good idea if the generic plan turns out to be not quite as good as PostgreSQL thought. It is also a good setting for data warehouses, where you normally run expensive analytical queries and saving on planning time is less important than getting the best possible execution plan.
Finally, the setting “
force_generic_plan” makes PostgreSQL use a generic plan immediately. We will make use of that setting later.
Where can you encounter a parameterized statement?
Parameterized statements in the PostgreSQL log
A parameterized statement looks like this in the log:
LOG: duration: 0.012 ms execute stmt: SELECT oid FROM pg_class WHERE relname = $1 DETAIL: parameters: $1 = 'pg_proc'
Usually, the parameters are logged as a detail message, but if there are many of them, it can be a considerable effort to replace all the placeholders with the parameter values. You also don’t see the parameter data type in the log, so you may have to look up the table definition to know whether you should write
'42'. If the statement caused an error and you didn’t set
log_parameter_max_length_on_error to a nonzero value, you won’t get the parameters logged at all:
ERROR: canceling statement due to statement timeout STATEMENT: SELECT oid FROM pg_class WHERE relname = $1
Parameterized statements in pg_stat_statements
pg_stat_statements is the Swiss army knife for analyzing database workloads. One of its features is that it ignores the values of constants, so that statements that only differ in constants are aggregated together. So if you query the
pg_stat_statements view, you will see placeholders even in statements that were not parameterized originally. Also, since
pg_stat_statements collects statistics over many executions of a statement, it doesn’t collect the actual parameter values for any of them.
The need for a generic plan
If you find a problem statement in the log or in
pg_stat_statements, you want to analyze its performance. In order to do that, you have to guess appropriate parameter values, so that you can get an execution plan with
EXPLAIN (ANALYZE, BUFFERS). This can be tedious and take a long time.
For a first analysis, it is helpful to see the execution plan as generated by
ANALYZE). Since “plain”
EXPLAIN does not execute the query, it should not depend on the actual parameter values, as long as we are happy with the generic plan. Unfortunately,
EXPLAIN refuses to generate a generic plan for a parameterized statement:
EXPLAIN SELECT oid FROM pg_class WHERE relname = $1; ERROR: there is no parameter $1 LINE 1: EXPLAIN SELECT oid FROM pg_class WHERE relname = $1; ^
We get that error even if we set
plan_cache_mode = force_generic_plan.
Generating a generic plan for a parameterized statement using
We want to do better than that, and we can. Using
PREPARE, we can create a prepared statement with a placeholder:
PREPARE stmt(name) AS SELECT oid FROM pg_class WHERE relname = $1;
Now we can force a generic plan and
EXPLAIN the prepared statement. We can supply NULL as parameter value, since NULL exists for every data type, and the parameter value is ignored anyway:
SET plan_cache_mode = force_generic_plan; EXPLAIN EXECUTE stmt(NULL); QUERY PLAN ═══════════════════════════════════════════════════════════════════════════════════════════ Index Scan using pg_class_relname_nsp_index on pg_class (cost=0.28..8.29 rows=1 width=4) Index Cond: (relname = $1) (2 rows) DEALLOCATE stmt;
The only remaining fly in the ointment is that we have to figure out the appropriate data types for the parameters.
Using the pseudo-type
unknown with a parameterized statement
Pseudo-types are data types that you cannot use in table definitions. One of these data types is “
unknown”: It is used during query parsing for string constants whose data type must be resolved later, based on the context. We can use
unknown as a data type for query parameters and let PostgreSQL figure out the appropriate data type itself:
PREPARE stmt(unknown) AS SELECT oid FROM pg_class WHERE relname = $1; SET plan_cache_mode = force_generic_plan; EXPLAIN EXECUTE stmt(NULL); QUERY PLAN ═══════════════════════════════════════════════════════════════════════════════════════════ Index Scan using pg_class_relname_nsp_index on pg_class (cost=0.28..8.29 rows=1 width=4) Index Cond: (relname = $1) (2 rows) DEALLOCATE stmt;
Putting it all together in an extension
Now we have a simple algorithm to get the generic plan of a parameterized statement:
- count the number of parameters
- create a prepared statement with that many “
- set “
plan_cache_mode” to “
EXPLAINthe prepared statements using NULLs as arguments
I wrapped all this into a function and wrote the extension generic_plan. It is written in PL/pgSQL and does not require superuser permissions to install. Here you can see it in action:
CREATE EXTENSION IF NOT EXISTS generic_plan; SELECT generic_plan('SELECT * FROM pg_sequences WHERE max_value < last_value + $1'); generic_plan ═════════════════════════════════════════════════════════════════════════════════════════════ Subquery Scan on pg_sequences (cost=1.09..24.10 rows=1 width=245) Filter: (pg_sequences.max_value < (pg_sequences.last_value + $1)) -> Nested Loop (cost=1.09..24.09 rows=1 width=245) Join Filter: (c.oid = s.seqrelid) -> Seq Scan on pg_sequence s (cost=0.00..1.06 rows=6 width=49) -> Materialize (cost=1.09..22.76 rows=3 width=136) -> Hash Join (cost=1.09..22.74 rows=3 width=136) Hash Cond: (c.relnamespace = n.oid) -> Seq Scan on pg_class c (cost=0.00..21.62 rows=8 width=76) Filter: (relkind = 'S'::"char") -> Hash (cost=1.05..1.05 rows=3 width=68) -> Seq Scan on pg_namespace n (cost=0.00..1.05 rows=3 width=68) Filter: (NOT pg_is_other_temp_schema(oid)) (13 rows)
It can be complicated to collect parameter values to analyze the execution of a parameterized statement, but using the generic_plan extension we can at least get a generic plan easily. The tricks used are a prepared statement with parameters of type “
plan_cache_mode and using NULLs as parameter values.
In case you’re interested in learning more about parameters, check out my blog on Query Parameter Data Types and Performance.