0

I have query (simplified version):

WITH temp AS (SELECT id, foo(id) AS foo FROM test)
SELECT id FROM temp WHERE foo = 4;

foo(id) is function that returns 0, 2 or 4 (only these values)

Above query with ... WHERE foo = 4 takes minutes, but surprisingly when I change to ... WHERE foo != 0 AND foo != 2 query performance is milliseconds.

Same if I do ... WHERE foo > 2 - also super fast.

I checked execution plan but I don't see any differences.

Very surprised by that ... can someone explain me why?

3
  • It is a function.The planner has no idea of what to expect. Commented Sep 24, 2019 at 21:11
  • ok, agree - but how does this impact foo = 4 vs foo > 2, in both cases it has no idea.
    – pankleks
    Commented Sep 24, 2019 at 21:18
  • This question would be a lot more useful with the actual function and table definitions, your version of Postgres, and the actual query plans. Commented Sep 24, 2019 at 22:36

1 Answer 1

1

Assuming that the function foo() cannot be inlined, Postgres has no idea what it might return, so it has to assume that any number is equally common.

The predicate foo = 4 tells Postgres to expect that next to no row will qualify.

The predicate foo != 0 AND foo != 2, OTOH, tells Postgres to expect that almost all rows qualify. With foo > 2 it's still about half of all rows.

That typically leads to different query plans, and the first one seems to perform poorly, while the other ones seem to perform nicely.

Details are hidden by missing information. But that's the point here.

If the function is IMMUTABLE, you might create an expression index on foo(foo(id)). That index is probably useless by itself, assuming that the 3 possible values are evenly distributed. (Maybe an index-only scan for a multicolumn index foo(foo(id), id) would help, if the function is expensive and declared as such.) But it makes Postgres gather additional statistics that would tell the query planner what to expect from the function. Related:

1
  • Thanks - makes sense. I can't post exact function it's way too complex. It's STABLE and receives more parameters so I can't really use expression index. I wonder if I would return ENUM with 3 values instead INTEGER - would it help.
    – pankleks
    Commented Sep 25, 2019 at 6:04

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.