A Beginner's Guide to Postgres Performance Monitoring
When setting up a new Postgres instance, there are a few things I do to make sure if something goes wrong, I'll be able to figure out what went wrong. Many people think you need to setup additional software to monitor Postgres performance, but you can get pretty far with some of the builtin tooling. This post is going to cover the following tools, all of which are provided by Postgres:
pg_stat_statements
The pg_stat_statements extension provides an aggregate view of how long your queries are taking. You can enable pg_stat_statements by:
- Running sudo apt-get install postgres-contrib-${YOUR POSTGRES VERSION}. This installs the postgres contrib package which provides additional functionality which is part of the official Postgres project, it's just not bundled with the default Postgres installation.
- Add pg_stat_statements to the shared_preload_libraries in your postgres.conf file, like so: shared_preload_libraries = 'pg_stat_statements'.
- Restart Postgres and run CREATE EXTENSION pg_stat_statements.
After enabling pg_stat_statements, you'll see a view called pg_stat_statements. The pg_stat_statements view has a ton of information on what queries are taking up the most time:
{% c-block language="sql" %}
> \d pg_stat_statements;
View "public.pg_stat_statements"
Column | Type | Collation | Nullable | Default
---------------------+------------------+-----------+----------+---------
userid | oid | | |
dbid | oid | | |
queryid | bigint | | |
query | text | | |
calls | bigint | | |
total_time | double precision | | |
min_time | double precision | | |
max_time | double precision | | |
mean_time | double precision | | |
stddev_time | double precision | | |
rows | bigint | | |
shared_blks_hit | bigint | | |
shared_blks_read | bigint | | |
shared_blks_dirtied | bigint | | |
shared_blks_written | bigint | | |
local_blks_hit | bigint | | |
local_blks_read | bigint | | |
local_blks_dirtied | bigint | | |
local_blks_written | bigint | | |
temp_blks_read | bigint | | |
temp_blks_written | bigint | | |
blk_read_time | double precision | | |
blk_write_time | double precision | | |
{% c-block-end %}
When optimizing a database, I start by finding which queries to optimize. I specifically look for two different kinds of queries. I first look at the queries that take up the most CPU time and then look at the queries that take the longest on average. You can get the queries that take up the most CPU time by running:
{% c-block language="sql" %}
SELECT total_time, query
FROM pg_stat_statements
ORDER BY total_time
DESC LIMIT 10;
{% c-block-end %}
The total_time column is the total amount of time spent running the query. You can think of it as average query time * number of times the query was ran.
You can get the queries that take the longest on average by running:
{% c-block language="sql" %}
SELECT mean_time, query
FROM pg_stat_statements
ORDER BY mean_time
DESC LIMIT 10;
{% c-block-end %}
Once I have the queries that are taking up the most time, I'll use the slow query log to find runnable instances of the query.
Relevant Links:
Postgres Docs on pg_stat_statements
Citus Blog Post on pg_stat_statements
The Slow Query Log
The issue with pg_stat_statements is it gives you normalized versions of the query. For example, if you run the query SELECT * FROM users WHERE id = 1, pg_stat_statements will show SELECT * FROM users WHERE id = ?. While nice for looking at queries from an aggregate perspective, when you want to optimize the query, you need an actual instance of the query with all the parameters filled in. That's where the slow query log comes in. When you enable the slow query log, Postgres will log the full text of any query that takes over some threshold. I usually like to set it to 100ms.
You can enable the slow query log by setting the log_min_duration_statement parameter in postgres.conf to the desired threshold:
{% c-block language="sql" %}
log_min_duration_statement = 100
{% c-block-end %}
After updating the config and running SELECT pg_reload_conf() to reload the config, Postgres will start logging all queries that take over the threshold.
After obtaining a query from pg_stat_statements, I'll search the slow query log for a particular instance of the query I can run. I'll then use EXPLAIN ANALYZE to understand why the query is taking so long.
Relevant Links
Postgres Docs on Logging Configuration
PGBadger - A tool for analyzing the Postgres slow query log.
statement_timeout
While not for performance monitoring per se, statement_timeout is a setting you should set regardless. When a query takes over the statement_timeout Postgres will abort it. Setting statement_timeout is the easiest way to prevent runaway queries from overloading your DB. Even though the Postgres docs recommend against it, I like setting a global timeout of 15s:
{% c-block language="sql" %}
statement_timeout = 15000
{% c-block-end %}
Then if there are queries that I know are going to take longer, I can raise the timeout for just those queries (when you use a Postgres library and set a timeout, that's likely using statement_timeout under the hood). If I'm running a query from psql and hitting the timeout, it's pretty easy to turn off the timeout by running:
{% c-block language="sql" %}
set statement_timeout = 0;
{% c-block-end %}
By setting a global statement_timeout and then raising it when necessary, I ensure that no query takes longer than 15 seconds, unless I've explicitly given that query permission to take longer.
Relevant Links
Postgres Docs on Connection Settings
Crunchy Data Blog Post on statement_timeout