1

Summary: Postgres randomly switches index on a non uniformly distributed table, causing massive performance drop.

We implemented the outbox pattern like so:

  • A postgres database
  • A golang worker to publish messages
    • This worker select for update the next messages to send
    • "locks" them for itself (locked_at set to non-null value)
    • Commit this transaction
    • It then publishes those messages
    • Flags them as sent (set sent_at) and resets the lock column (locked_at) using their id
  • A cron to purge the oldest messages

The worker's where clause look like this:

WHERE sent_at IS NULL
AND available_at <= NOW() -- Exclude messages to be sent in the future
AND ( locked_at IS NULL OR locked_at <= [NOW - TIMEOUT]) -- exclude locked messages
AND retry_count < 30 -- Exclude messages we gave up sending
ORDER BY available_at ASC
LIMIT 10
FOR UPDATE SKIP LOCKED

The purge query look like so:

WITH rows_to_delete AS (
    SELECT id
    FROM outbox
    WHERE sent_at <= [DATE_THRESHOLD]
    LIMIT [LIMIT_FOR_BATCHING]
)
DELETE FROM outbox
WHERE id IN (SELECT id FROM rows_to_delete)

Relevant Table schema:

Column Type Comment
id bigint primary key
available_at timestamp(0) with time zone not null when we can publish the message, may be in the future
locked_at timestamp(0) with time zone is this message already being published by a worker
sent_at timestamp(0) with time zone when was it published
retry_count timestamp(0) with time zone not null default 0 retry count if publishing fails

Indexes:

name definition Comment
pkey PRIMARY KEY, btree (id)
messages_sent btree (sent_at) index we created for the purge query
messages_to_recover btree (available_at) WHERE sent_at IS NULL AND retry_count >= 30 partial index we created to recover after an outage (= retry again when things are back)
messages_to_send btree (available_at) WHERE sent_at IS NULL AND retry_count < 30 partial index we want to use for selecting the next messages to send

The issue

Most of the time we have just a couple of messages to publish (sent_at IS NULL), a lot (millions) of messages already published (sent_at IS NOT NULL) and a couple of retry_count > 0. We also correctly use the index messages_to_send when selecting messages, and the response time is around 4ms.

but sometimes Postgres changes the index it uses to messages_sent instead of messages_to_send, performance drop drastically and it comes back to the correct index after a while (unless everything falls apart until then)

So far the best lead we found is the statistics used by the query planner

When everything is OK When we have the issue
good query stats bad query stats
We use the expected index, actual rows and planned rows are comparable and the query is fast We use the wrong index, planned rows is way off actual rows and the query is really slow

The question I have is :

How to prevent the query planner from switching index ?

and/or

How can we make sure it has correct statistical data despite the distribution not being uniform (99.99% of the table has sent_at NOT NULL yet we may have 0-100k sent_at IS NULL, so if it randomly samples this table, I don't understand how it can even get close to the real distribution)

Thanks a lot !

2
  • Images are very hard to read. Please share the complete and original query plans in plain text, using EXPLAIN(ANALYZE, VERBOSE, BUFFERS, SETTINGS) for your SQL statements. Commented May 27 at 14:17
  • When zooming in, I can see that "actual rows" is very different for the two queries: 150 rows vs 700000 rows. Why do you expect the same performance? Commented May 27 at 14:20

2 Answers 2

0

There are a couple of approaches:

  1. Lower autovacuum_analyze_scale_factor for that table (or set it to 0 while setting autovacuum_analyze_threshold to a reasonable value), so that statistics for the table are collected more often and has good enough statistics to pick the correct index.

  2. Create a more specific index. One step in that direction is to avoid NULL values in locked_at. You could for example use -Infinity instead of NULL. That allows you get rid of the annoying OR, and the index could be

    CREATE INDEX ON tab (available_at, locked_at) WHERE sent_at IS NULL AND retry_count < 30;
    
  3. If you have SSDs or comparable storage, lower random_page_cost from the default value 4 to something closer to 1.

Sign up to request clarification or add additional context in comments.

3 Comments

Indeed sorry I fixed my message it's actually locked_at <= [NOW - TIMEOUT] (to include messages locked for too long).
Perhaps edit the question to make it clearer.
yes I did update the question
0

Purely from experience I'm inclined to avoid OR constructions; they tend to make life difficult for the optimizer in most RDBM's. Not sure that's the case here, but maybe you can treat both situations separately?

WHERE sent_at IS NULL
AND available_at <= NOW() -- Exclude messages to be sent in the future
AND locked_at IS NULL
AND retry_count < 30 

and

WHERE sent_at IS NULL
AND available_at <= NOW() -- Exclude messages to be sent in the future
AND locked_at <= [NOW - TIMEOUT] -- exclude locked messages
AND retry_count < 30 

If necessary you can always put both in a CTE and then UNION ALL them in the result.

PS: As noted already in the comments, the left example handles 150 rows in 4ms, the right one handles 694k rows in 126ms ... that are 2 very different situations!

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.