Postgres SQL - strange performance issue in select

Question

I have query (simplified version):

WITH temp AS (SELECT id, foo(id) AS foo FROM test)
SELECT id FROM temp WHERE foo = 4;

foo(id) is function that returns 0, 2 or 4 (only these values)

Above query with ... WHERE foo = 4 takes minutes, but surprisingly when I change to ... WHERE foo != 0 AND foo != 2 query performance is milliseconds.

Same if I do ... WHERE foo > 2 - also super fast.

I checked execution plan but I don't see any differences.

Very surprised by that ... can someone explain me why?

ok, agree - but how does this impact foo = 4 vs foo > 2, in both cases it has no idea. — pankleks, Commented Sep 24, 2019 at 21:18
This question would be a lot more useful with the actual function and table definitions, your version of Postgres, and the actual query plans. — Erwin Brandstetter, Commented Sep 24, 2019 at 22:36

Erwin Brandstetter · Accepted Answer · 2019-09-24 23:01:07Z

Assuming that the function foo() cannot be inlined, Postgres has no idea what it might return, so it has to assume that any number is equally common.

The predicate foo = 4 tells Postgres to expect that next to no row will qualify.

The predicate foo != 0 AND foo != 2, OTOH, tells Postgres to expect that almost all rows qualify. With foo > 2 it's still about half of all rows.

That typically leads to different query plans, and the first one seems to perform poorly, while the other ones seem to perform nicely.

Details are hidden by missing information. But that's the point here.

If the function is IMMUTABLE, you might create an expression index on foo(foo(id)). That index is probably useless by itself, assuming that the 3 possible values are evenly distributed. (Maybe an index-only scan for a multicolumn index foo(foo(id), id) would help, if the function is expensive and declared as such.) But it makes Postgres gather additional statistics that would tell the query planner what to expect from the function. Related:

Index that is not used, yet influences query

Thanks - makes sense. I can't post exact function it's way too complex. It's STABLE and receives more parameters so I can't really use expression index. I wonder if I would return ENUM with 3 values instead INTEGER - would it help. — pankleks, Commented Sep 25, 2019 at 6:04

Collectives™ on Stack Overflow

Postgres SQL - strange performance issue in select

1 Answer 1

Hot Network Questions