postgres and statistics on outbox pattern table

Question

Summary: Postgres randomly switches index on a non uniformly distributed table, causing massive performance drop.

We implemented the outbox pattern like so:

A postgres database
A golang worker to publish messages
- This worker select for update the next messages to send
- "locks" them for itself (locked_at set to non-null value)
- Commit this transaction
- It then publishes those messages
- Flags them as sent (set sent_at) and resets the lock column (locked_at) using their id
A cron to purge the oldest messages

The worker's where clause look like this:

WHERE sent_at IS NULL
AND available_at <= NOW() -- Exclude messages to be sent in the future
AND ( locked_at IS NULL OR locked_at <= [NOW - TIMEOUT]) -- exclude locked messages
AND retry_count < 30 -- Exclude messages we gave up sending
ORDER BY available_at ASC
LIMIT 10
FOR UPDATE SKIP LOCKED

The purge query look like so:

WITH rows_to_delete AS (
    SELECT id
    FROM outbox
    WHERE sent_at <= [DATE_THRESHOLD]
    LIMIT [LIMIT_FOR_BATCHING]
)
DELETE FROM outbox
WHERE id IN (SELECT id FROM rows_to_delete)

Relevant Table schema:

Column	Type	Comment
id	bigint	primary key
available_at	timestamp(0) with time zone not null	when we can publish the message, may be in the future
locked_at	timestamp(0) with time zone	is this message already being published by a worker
sent_at	timestamp(0) with time zone	when was it published
retry_count	timestamp(0) with time zone not null default 0	retry count if publishing fails

Indexes:

name	definition	Comment
pkey	PRIMARY KEY, btree (id)
messages_sent	btree (sent_at)	index we created for the purge query
messages_to_recover	btree (available_at) WHERE sent_at IS NULL AND retry_count >= 30	partial index we created to recover after an outage (= retry again when things are back)
messages_to_send	btree (available_at) WHERE sent_at IS NULL AND retry_count < 30	partial index we want to use for selecting the next messages to send

The issue

Most of the time we have just a couple of messages to publish (sent_at IS NULL), a lot (millions) of messages already published (sent_at IS NOT NULL) and a couple of retry_count > 0. We also correctly use the index messages_to_send when selecting messages, and the response time is around 4ms.

but sometimes Postgres changes the index it uses to messages_sent instead of messages_to_send, performance drop drastically and it comes back to the correct index after a while (unless everything falls apart until then)

So far the best lead we found is the statistics used by the query planner

When everything is OK	When we have the issue

We use the expected index, actual rows and planned rows are comparable and the query is fast	We use the wrong index, planned rows is way off actual rows and the query is really slow

The question I have is :

How to prevent the query planner from switching index ?

and/or

How can we make sure it has correct statistical data despite the distribution not being uniform (99.99% of the table has sent_at NOT NULL yet we may have 0-100k sent_at IS NULL, so if it randomly samples this table, I don't understand how it can even get close to the real distribution)

Thanks a lot !

Images are very hard to read. Please share the complete and original query plans in plain text, using EXPLAIN(ANALYZE, VERBOSE, BUFFERS, SETTINGS) for your SQL statements. — Frank Heikens
– Frank Heikens, Commented May 27 at 14:17
When zooming in, I can see that "actual rows" is very different for the two queries: 150 rows vs 700000 rows. Why do you expect the same performance? — Frank Heikens
– Frank Heikens, Commented May 27 at 14:20

Laurenz Albe · Accepted Answer · 2025-05-28 11:17:07Z

0

There are a couple of approaches:

Lower autovacuum_analyze_scale_factor for that table (or set it to 0 while setting autovacuum_analyze_threshold to a reasonable value), so that statistics for the table are collected more often and has good enough statistics to pick the correct index.
Create a more specific index. One step in that direction is to avoid NULL values in locked_at. You could for example use -Infinity instead of NULL. That allows you get rid of the annoying OR, and the index could be
```
CREATE INDEX ON tab (available_at, locked_at) WHERE sent_at IS NULL AND retry_count < 30;
```
If you have SSDs or comparable storage, lower random_page_cost from the default value 4 to something closer to 1.

edited May 28 at 11:17

answered May 27 at 12:31

Laurenz Albe

257k22 gold badges312 silver badges388 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Simon Watiau May 27 at 15:32

Indeed sorry I fixed my message it's actually locked_at <= [NOW - TIMEOUT] (to include messages locked for too long).

Laurenz Albe May 28 at 7:35

Perhaps edit the question to make it clearer.

Simon Watiau May 28 at 9:19

yes I did update the question

deroby · Accepted Answer · 2025-05-30 16:43:06Z

Purely from experience I'm inclined to avoid OR constructions; they tend to make life difficult for the optimizer in most RDBM's. Not sure that's the case here, but maybe you can treat both situations separately?

WHERE sent_at IS NULL
AND available_at <= NOW() -- Exclude messages to be sent in the future
AND locked_at IS NULL
AND retry_count < 30

and

WHERE sent_at IS NULL
AND available_at <= NOW() -- Exclude messages to be sent in the future
AND locked_at <= [NOW - TIMEOUT] -- exclude locked messages
AND retry_count < 30

If necessary you can always put both in a CTE and then UNION ALL them in the result.

PS: As noted already in the comments, the left example handles 150 rows in 4ms, the right one handles 694k rows in 126ms ... that are 2 very different situations!

Collectives™ on Stack Overflow

postgres and statistics on outbox pattern table

The issue

2 Answers 2

3 Comments

Comments

Hot Network Questions

Collectives™ on Stack Overflow

The issue

2 Answers 2

3 Comments

Comments

Related