Skip to content

Cloud & Infrastructure · Database

PostgreSQL Performance Tuning: EXPLAIN ANALYZE, Indexes, and Query Debugging

The queries that destroy production Postgres databases share a small set of root causes. Here is how to find them, understand what EXPLAIN ANALYZE is telling you, and fix them without guessing.

Anurag Verma

Anurag Verma

8 min read

PostgreSQL Performance Tuning: EXPLAIN ANALYZE, Indexes, and Query Debugging

Sponsored

Share

Production Postgres problems usually announce themselves the same way: a dashboard that was fast yesterday is now slow, queries that took 12ms are taking 4 seconds, and the database CPU is at 90% doing something that looks like it should be simple.

The good news is that Postgres ships with enough diagnostic tools that you almost never need to guess. The problem is that reading those tools isn’t obvious. EXPLAIN ANALYZE output looks intimidating until you understand what it’s actually reporting. This post covers the patterns that show up most often, and how to read the output that leads you to the fix.

Start With the Slow Queries

Before you tune anything, find out what’s actually slow. The pg_stat_statements extension collects query statistics across all sessions:

-- Enable it (requires a restart if not already loaded)
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;

-- Top 10 queries by total execution time
SELECT
  round(total_exec_time::numeric, 2) AS total_ms,
  calls,
  round(mean_exec_time::numeric, 2) AS mean_ms,
  round(stddev_exec_time::numeric, 2) AS stddev_ms,
  left(query, 120) AS query
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 10;

This gives you the actual offenders. A query called once that takes 30 seconds matters. A query called 50,000 times that averages 8ms matters more — it’s burning 400 seconds of CPU per batch. Focus on total_exec_time, not just mean_exec_time.

Reset the stats after a deploy or after making changes, so you can see the impact clearly:

SELECT pg_stat_statements_reset();

Reading EXPLAIN ANALYZE Output

EXPLAIN ANALYZE runs the actual query (be careful on writes — use a transaction you roll back), measures what happened, and returns the execution plan with real timing.

EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT u.email, count(o.id) AS order_count
FROM users u
JOIN orders o ON o.user_id = u.id
WHERE u.created_at > '2025-01-01'
GROUP BY u.email
ORDER BY order_count DESC
LIMIT 50;

The output looks like this:

Limit  (cost=8432.10..8432.23 rows=50 width=44) (actual time=142.301..142.310 rows=50 loops=1)
  ->  Sort  (cost=8432.10..8457.10 rows=10000 width=44) (actual time=142.299..142.303 rows=50 loops=1)
        Sort Key: (count(o.id)) DESC
        Sort Method: top-N heapsort  Memory: 28kB
        ->  HashAggregate  (cost=8032.10..8132.10 rows=10000 width=44) (actual time=138.981..141.214 rows=10823 loops=1)
              Group Key: u.email
              Batches: 1  Memory Usage: 1681kB
              ->  Hash Join  (cost=1820.00..7282.10 rows=150000 width=36) (actual time=22.891..122.441 rows=142831 loops=1)
                    Hash Cond: (o.user_id = u.id)
                    ->  Seq Scan on orders o  (cost=0.00..4210.00 rows=210000 width=16) (actual time=0.021..42.431 rows=210000 loops=1)
                    ->  Hash  (cost=1625.00..1625.00 rows=15600 width=28) (actual time=22.611..22.613 rows=15621 loops=1)
                          Buckets: 16384  Batches: 1  Memory Usage: 974kB
                          ->  Seq Scan on users u  (cost=0.00..1625.00 rows=15600 width=28) (actual time=0.026..17.621 rows=15621 loops=1)
                                Filter: (created_at > '2025-01-01 00:00:00'::timestamp)
                                Rows Removed by Filter: 49379

What to read:

Seq Scan — a sequential scan reads every row in the table. Not always bad (small tables, high-selectivity filters that still return most rows), but on large tables it’s usually the culprit. Here, Postgres scanned all 210,000 orders rows and all 65,000 user rows. The filter on created_at removed 49,379 users, but the scan still happened.

actual time=X..Y — X is the time to get the first row, Y is total time. The gap matters for joins and aggregates. A Sort node with actual time=0.001..18.200 is spending almost all its time actually sorting.

rows=N loops=Mrows is rows processed in one loop, loops is how many times the node ran. rows=50 loops=100 means 5,000 total rows were processed.

Rows Removed by Filter — this is the red flag for missing indexes. Here, 49,379 users were read and discarded because they didn’t match the created_at filter. That’s wasted work.

Adding the Right Index

The scan above tells us we need an index on users.created_at. Simple case:

CREATE INDEX CONCURRENTLY idx_users_created_at ON users (created_at);

CONCURRENTLY builds the index without locking the table — use it in production. After creating the index, re-run EXPLAIN ANALYZE:

->  Bitmap Heap Scan on users u  (actual time=1.241..4.832 rows=15621 loops=1)
      Recheck Cond: (created_at > '2025-01-01')
      ->  Bitmap Index Scan on idx_users_created_at  (actual time=0.891..0.892 rows=15621 loops=1)
            Index Cond: (created_at > '2025-01-01')

The total time for that node dropped from 17ms to 4ms. Small win here because the users table wasn’t the bottleneck, but on larger tables this difference is dramatic.

Index Types: When the Default B-tree Isn’t What You Need

Postgres supports several index types. B-tree is the default and handles equality, range queries, and sorting. But others apply in specific cases:

GIN indexes for jsonb, arrays, and full-text search:

-- Querying a jsonb column
SELECT * FROM events WHERE payload @> '{"type": "purchase"}';

-- Without an index: Seq Scan on every row
-- With GIN:
CREATE INDEX CONCURRENTLY idx_events_payload_gin ON events USING gin (payload);

GIN also applies when you’re querying tsvector columns for full-text search. If you have a to_tsvector column and you’re not using a GIN index, full-text search is a sequential scan.

Partial indexes when you only query a subset of rows:

-- If you always filter by status = 'pending' before other filters
CREATE INDEX CONCURRENTLY idx_orders_pending ON orders (created_at, user_id)
WHERE status = 'pending';

This index is smaller (only pending orders), so it’s faster to scan and cheaper on RAM.

Composite indexes — the column order matters:

-- Query: WHERE user_id = $1 AND created_at > $2
-- Good index: user_id first (equality), created_at second (range)
CREATE INDEX ON orders (user_id, created_at);

-- Bad index for this query: reversed
CREATE INDEX ON orders (created_at, user_id);
-- Postgres can use this, but it scans the date range first, then filters by user

The rule: equality conditions before range conditions in composite indexes.

The N+1 Query Problem at the Database Level

N+1 is usually framed as an ORM problem, but it shows up in the database stats as a huge number of identical fast queries:

-- pg_stat_statements shows:
calls: 48,230
mean_exec_time: 0.8ms
query: SELECT * FROM orders WHERE user_id = $1 LIMIT 100

48,230 calls at 0.8ms each is 38 seconds of database time for what should be one query. The fix isn’t in the database — it’s changing the application code to fetch all user IDs in one query:

SELECT * FROM orders WHERE user_id = ANY($1) LIMIT 100;
-- Or a proper JOIN in the original query

But pg_stat_statements is what surfaces it. The high calls count with a parametrized query is the signal.

Connection Saturation

Another common production problem isn’t slow queries — it’s too many connections. Postgres forks a process per connection, and at 500+ active connections, the overhead starts degrading performance. The typical default max_connections is 100 or 200.

Check current connections:

SELECT count(*), state, wait_event_type, wait_event
FROM pg_stat_activity
GROUP BY state, wait_event_type, wait_event
ORDER BY count DESC;

If you see hundreds of idle connections, your application is opening connections and not releasing them properly. If you see connections in Lock Wait, something has a lock that others are waiting on — find the blocking query:

SELECT
  blocking.pid AS blocking_pid,
  blocking.query AS blocking_query,
  blocked.pid AS blocked_pid,
  blocked.query AS blocked_query
FROM pg_stat_activity blocked
JOIN pg_stat_activity blocking ON blocking.pid = ANY(pg_blocking_pids(blocked.pid))
WHERE cardinality(pg_blocking_pids(blocked.pid)) > 0;

Kill the offending query if it’s stuck:

SELECT pg_terminate_backend(blocking_pid);

For connection count issues, the solution is a connection pooler. PgBouncer in transaction pooling mode sits between your app and Postgres, multiplexing many application connections into far fewer database connections. Supabase uses PgBouncer by default for this reason.

Table Bloat: The Invisible Slowdown

Postgres uses MVCC — deleted and updated rows aren’t removed immediately, they’re marked as invisible. VACUUM reclaims them. If VACUUM can’t keep up (high write volume, long-running transactions that prevent cleanup), tables accumulate dead tuples.

SELECT
  schemaname,
  relname,
  n_dead_tup,
  n_live_tup,
  round(n_dead_tup::numeric / nullif(n_live_tup, 0) * 100, 1) AS dead_ratio
FROM pg_stat_user_tables
ORDER BY n_dead_tup DESC
LIMIT 15;

A dead_ratio above 10% on a frequently-queried table is worth addressing. Run VACUUM ANALYZE tablename manually, and check if autovacuum settings are aggressive enough for your write rate:

-- Per-table autovacuum tuning for high-write tables
ALTER TABLE orders SET (
  autovacuum_vacuum_scale_factor = 0.01,  -- vacuum when 1% of rows are dead
  autovacuum_analyze_scale_factor = 0.005
);

EXPLAIN Options Worth Using

The full EXPLAIN syntax gives you more data when you need it:

EXPLAIN (ANALYZE, BUFFERS, FORMAT JSON)
SELECT ...;

BUFFERS shows cache hits vs disk reads. The shared_hit count (data served from Postgres’s shared buffer cache) vs shared_read (data read from disk or OS page cache) tells you if adding RAM would help. If you’re seeing many shared_read values on frequently-run queries, your shared_buffers setting (default: 128MB) is too small. A common starting point is 25% of available RAM.

FORMAT JSON outputs the plan as JSON, which you can paste into tools like explain.dalibo.com for a visual rendering of the plan tree. Easier to read than the text format for complex plans.

The text from this post covers the most common 80% of Postgres performance problems. For the remaining 20% — partitioning, custom statistics, parallel queries, foreign data wrappers — the official Postgres documentation chapters on performance and EXPLAIN are genuinely well-written and worth reading directly.

Sponsored

Enjoyed it? Pass it on.

Share this article.

Sponsored

The dispatch

Working notes from
the studio.

A short letter twice a month — what we shipped, what broke, and the AI tools earning their keep.

No spam, ever. Unsubscribe anytime.

Discussion

Join the conversation.

Comments are powered by GitHub Discussions. Sign in with your GitHub account to leave a comment.

Sponsored