Summary: Postgres randomly switches index on a non uniformly distributed table, causing massive performance drop.
We implemented the outbox pattern like so:
- A postgres database
- A golang worker to publish messages
- This worker
select for updatethe next messages to send - "locks" them for itself (
locked_atset to non-null value) - Commit this transaction
- It then publishes those messages
- Flags them as sent (set
sent_at) and resets the lock column (locked_at) using theirid
- This worker
- A cron to purge the oldest messages
The worker's where clause look like this:
WHERE sent_at IS NULL
AND available_at <= NOW() -- Exclude messages to be sent in the future
AND ( locked_at IS NULL OR locked_at <= [NOW - TIMEOUT]) -- exclude locked messages
AND retry_count < 30 -- Exclude messages we gave up sending
ORDER BY available_at ASC
LIMIT 10
FOR UPDATE SKIP LOCKED
The purge query look like so:
WITH rows_to_delete AS (
SELECT id
FROM outbox
WHERE sent_at <= [DATE_THRESHOLD]
LIMIT [LIMIT_FOR_BATCHING]
)
DELETE FROM outbox
WHERE id IN (SELECT id FROM rows_to_delete)
Relevant Table schema:
| Column | Type | Comment |
|---|---|---|
| id | bigint | primary key |
| available_at | timestamp(0) with time zone not null | when we can publish the message, may be in the future |
| locked_at | timestamp(0) with time zone | is this message already being published by a worker |
| sent_at | timestamp(0) with time zone | when was it published |
| retry_count | timestamp(0) with time zone not null default 0 | retry count if publishing fails |
Indexes:
| name | definition | Comment |
|---|---|---|
| pkey | PRIMARY KEY, btree (id) | |
| messages_sent | btree (sent_at) | index we created for the purge query |
| messages_to_recover | btree (available_at) WHERE sent_at IS NULL AND retry_count >= 30 | partial index we created to recover after an outage (= retry again when things are back) |
| messages_to_send | btree (available_at) WHERE sent_at IS NULL AND retry_count < 30 | partial index we want to use for selecting the next messages to send |
The issue
Most of the time we have just a couple of messages to publish (sent_at IS NULL), a lot (millions) of messages already published (sent_at IS NOT NULL) and a couple of retry_count > 0.
We also correctly use the index messages_to_send when selecting messages, and the response time is around 4ms.
but sometimes Postgres changes the index it uses to messages_sent instead of messages_to_send, performance drop drastically and it comes back to the correct index after a while (unless everything falls apart until then)
So far the best lead we found is the statistics used by the query planner
| When everything is OK | When we have the issue |
|---|---|
![]() |
![]() |
| We use the expected index, actual rows and planned rows are comparable and the query is fast | We use the wrong index, planned rows is way off actual rows and the query is really slow |
The question I have is :
How to prevent the query planner from switching index ?
and/or
How can we make sure it has correct statistical data despite the distribution not being uniform (99.99% of the table has sent_at NOT NULL yet we may have 0-100k sent_at IS NULL, so if it randomly samples this table, I don't understand how it can even get close to the real distribution)
Thanks a lot !

