9.7 Monitoring Deferred SQL and Periodically Backing up statistics

After production starts, SQL execution can be delayed due to a variety of factors, including unexpected system load and data bias. You can detect delays early by monitoring for delays that affect operations. Monitoring can also reduce the time required for recovery tasks, such as determining the cause and resolving delays.

One cause of deferred SQL is a change in statistics. Some delays caused by statistics can be resolved by restoring the statistics before the delay occurred. For this reason, back up your statistics periodically to guard against delays.

Operational workflow

The workflow required to monitor delayed SQL and back up statistics is shown below.

Corresponding time	Work item	Work details
During environment construction	(1) Enabling the pg_dbms_stats extension	Enable the pg_dbms_stats extension with the Create Extension statement.
During environment construction	(2) Logging settings for delayed SQL monitoring	Set the log_min_duration_statement parameter etc. in postgresql.conf.
During system development	(3) Creating and reviewing a script for backing up statistics (*1)	Provide a script to back up statistics according to the statistics and Vacuum operation.
During operation	(4) Periodic backup of statistics	Execute the script created in (3) to back up the statistics.
During operation	(5) Monitoring delayed SQL and adjusting the allowable delay time	Monitor the occurrence of delayed SQL and adjust the allowable delay time to find the optimal settings for operation.
When delayed SQL occurs	(6) Investigating the cause of delayed SQL	If a delayed SQL occurs, we will investigate the cause of the SQL delay and consider countermeasures.
When delayed SQL occurs	(7) Restoring statistics (*2)	Delayed SQL is eliminated by restoring and fixing the backed up statistics.

*1: Although it is possible to perform the work during operation, please consider the impact on operation and perform the work before starting operation.

*2: If it is determined that the cause is not statistics, take appropriate measures according to the cause.

The specific work content is shown below.

(1) Enabling the pg_dbms_stats extension

To back up and restore statistics, you need the export and import functions of the pg_dbms_stats extension, so enable the pg_dbms_stats extension.

For instructions on how to enable it, refer to " Setting Up pg_dbms_stats" in the Installation and Setup Guide for Server.

(2) Logging settings for delayed SQL monitoring

Database logs are used to monitor delayed SQL.

Make the following settings in postgresql.conf. Note that the date and time are important when investigating the cause of delayed SQL, set the format and file name to include the date and time.

Parameter	Specified value
logging_collector	on
log_line_prefix	Specify the following format and add information required for investigation, such as the time and executed application. [%t]%u %d %p[%l]
log_filename	Since you will need to check past execution records, set it so that you can see the date and time. postgresql.%Y-%m-%d
log_min_duration_statement	300000

Point

If the value specified for the log_min_duration_statement parameter is small, even SQL that does not need to be considered delayed will be output to the log, and if it is too large, delayed SQL cannot be detected. Therefore, if the setting value cannot be estimated, set it small at the start of operation, and adjust it to a larger value as necessary while monitoring during operation.

(3) Creating and reviewing a script for backing up statistics

To back up statistics, use the export function of the pg_dbms_stats extension.

The backup is used not only for restoration but also for identifying the cause of delayed SQL, so the time the backup was performed is important. Therefore, be sure to create a script like the one below so that the backup time is known.

We also provide a sample script that uses the export function to export on a database basis. This allows users to respond more flexibly by making modifications themselves, such as making changes on a schema basis, so copy the script to the execution current before using it.

The sample script is stored below. Also, "<x>" indicates the product version.

/opt/fsepv<x>server64/share/doc/extension/export_plain_stats-16.sql.sample

The following is an example of how to create a database using Fujitsu Enterprise Postgres 16 by copying the database to the current execution directory as "export_plain_stats-16.sql".

expprt PATH=/opt/fsepv16server64/bin:$PATH
export CURRENTDIR=/pg_dbms_stats/backup

pushd "${CURRENTDIR}"

# make file name
EXECDATE=`date '+%Y%m%d%H%M%S'`
FILENAME=pg_dbms_stats_back.${EXECDATE}.dmp

# export
psql -d database1 -f export_plain_stats-16.sql

# rename dump file
mv export_stats.dmp ${FILENAME}
popd

(4) Periodic backup of statistics

Execute the script created in "(3) Creating and reviewing a script for backing up statistics" to back up the statistics information on a regular basis.

Backup execution interval

Perform backup at the following timings according to the statistics update method.

When statistics is automatically updated (autovacuum is enabled)

Statistics are updated when autovacuum is run.

Therefore, use the OS's cron command or schedule function of the task manager to periodically execute the export function and back up the statistics to a file.

Therefore, use the OS's cron command or a scheduler such as task manager to periodically execute the script created in "(3) Creating and reviewing a script for backing up statistics" and back up the statistics information to a file.

Obtain the shortest autovacuum execution interval from the log, and back up the statistics at an interval shorter than the obtained autovacuum execution interval. autovacuum logs can be obtained by setting the log_autovacuum_min_duration parameter to 0.

When users update statistics: If the user controls the updating of statistics using the ANALYZE command, etc.,back up the statistics at the same time as updating the statistics.

Backup retention period

The backed up statistics will be used as investigation material to determine the cause of delayed SQL if a performance problem occurs. Therefore, keep it for the period of time expected to be required from the occurrence of the operational problem to its resolution.

Point

The capacity required for backup mainly depends on the number of objects (schemas, tables, etc.) contained in the database. Estimate the required capacity from the backup file when checking the operation of the script and secure this space.
To restore backed up statistics, the names of objects such as schemas and tables in the backup source must match. If you change the name of an object during operation, change the backup file name or storage location so that you can check whether it matches the object definition when restoring statistics.

(5) Monitoring delayed SQL and adjusting the allowable delay time

Monitors SQL that is delayed beyond the time set in the log_min_duration_statement parameter.

The settings in this section will be output in the following format.

[2024-04-30 10:20:11 JST]user1 postgres 3414[1]LOG: duration: 301001.541 ms  statement: SELECT pg_sleep(301)
[2024-04-30 10:26:12 JST]user1 postgres 3414[1]LOG: duration: 302002.321 ms  statement: SELECT pg_sleep(302)

When detected SQL is deemed to be delayed more than expected: Refer to "(6) Investigating the cause of delayed SQL" to determine the cause and take appropriate measures.

When SQL with execution time that does not affect operation is detected: The current setting just outputs redundant logs, so make major changes to the log_min_duration_statement parameter and then reload to enable it.

(6) Investigating the cause of delayed SQL

There are many factors that can cause SQL delays, so you must first identify the cause.

Here we will show you how to investigate SQL delays caused by changes in statistics.

Note that in the following cases, delays caused by statistics are unlikely, so recommend investigating from a different perspective (I/O, system load, etc.).

When the SQL statement does not contain a search condition
When the SQL statement uses the pg_hint_plan extension

Check the delayed SQL from the server log and identify the schema used by that SQL.
In a database cluster (such as a development environment) separate from the production environment, create an environment (hereafter referred to as the reproduction environment) with the same database name and table definition as the environment where the problem occurred. The reproduction environment also requires the pg_dbms_stats extension, so enable the pg_dbms_stats extension.
Use the pg_dump command from the production environment to obtain the target table definition.
Since the data is not needed at this time, use the -s option of the pg_dump command from the production environment and use the extracted table definition.
The following is an example of output to the file ddl_schema.dmp when the schema to be investigated is schema_1 in database database1.
```
pg_dump -d database1 -s schema_1 > ddl_schema.dmp
```
Using ddl_schema.dmp, create a research schema named schema_1 in the database named database1 in the reproduction environment.
```
psql -d database1 -f ddl_schema.dmp
```
In order to identify the cause of the SQL delay, use the backed up statistics to check whether there is a cause of the delay in the statistics. Import the statistics into the schema in the reproduction environment using the pg_dbms_stats import function (dbms_stats.import_schema_stats).
The following is an example of restoring statistics for schema schema_1 from the backup file "pg_dbms_stats_back.20240422011223.dmp".
```
psql -d database1 -c 
"SELECT dbms_stats.import_schema_stats('schema_1','pg_dbms_stats_back.20240422011223.dmp')"
```
Then, execute the EXPLAIN command for the deferred SQL to obtain the execution result (query plan).
Restore the statistics by separately importing the statistics that were backed up during the previous time period when deferred SQL was running stably.
For example, if the target SQL is executed daily as part of regular batch processing, use the data from the same time the previous day. For online processing, use the data from a previous time period when a similar business application is expected to be executed.
The following is an example of restoring the statistics from the backup file "pg_dbms_stats_back.20240421010001.dmp" from one day ago, assuming the SQL is used in batch processing.
```
psql -d database1 -c 
"SELECT dbms_stats.import_schema_stats('schema_1','pg_dbms_stats_back.20240421010001.dmp')"
```
It likewise obtains the result of executing the EXPLAIN command on deferred SQL (query plan).
Compare the execution results (query plans) of 4. and 5., and take the following action based on the results.
- If the query plans are the same or the expected execution time of the query plan in 5. is longer
  This may be a problem other than statistics (system load, index imbalance, I/O, etc.), so investigate from a different perspective.
- Other than the above
  The problem is likely that the optimal query plan was not selected due to updated statistics.
  Refer to "(7) Restoring statistics" and import the statistics from the period when the system was operating stably into the production environment to restore the system.

(7) Restoring statistics

Restore the production environment statistics to the state they were in before the delay occurred. The procedure is the same as for verification, but it is as follows:

psql -d database1 -c 
"SELECT dbms_stats.import_schema_stats('schema_1','pg_dbms_stats_back.20240422000000')"