I have 20 tables with identical structure, and I need to aggregate the results.
I'm using UNION ALL
to collect the results, in the following structure:
SELECT something, SUM(total) AS overall_total
FROM (
SELECT something, SUM(something_else) AS total
FROM table1
GROUP BY something
UNION ALL
SELECT something, SUM(something_else) AS total
FROM table2
GROUP BY something
UNION ALL
...
) AS X
GROUP BY something
ORDER BY overall_total DESC
LIMIT 10
In this case the DB uses Parallel Append
with multiple workers, which is very fast.
However, each table returns a large amount of data so aggregation is slow.
To improve performance, I want to limit the number of rows returned from each table to the top 100 results only, since the overall will always rely on those.
I tried using either LIMIT
or ROW_NUMBER() OVER()
to retrieve only the top rows from each table:
SELECT something, SUM(something_else) AS total
FROM table2
GROUP BY something
ORDER BY total DESC
LIMIT 100
SELECT something, SUM(something_else) AS total, ROW_NUMBER() OVER() as r
FROM table2
WHERE r <= 100
GROUP BY something
ORDER BY total DESC
But now the plan never uses Parallel Append, so data fetch is very slow and overall performance is worse. Can't get the parallel plan even using flags like:
set force_parallel_mode = on;
set max_parallel_workers_per_gather = 10;
set parallel_setup_cost = 0;
set parallel_tuple_cost = 0;
Reading PostgreSQL documentation, parallelization can't be done when using unsafe operations. But it didn't mention LIMIT
and ROW_NUMBER
as being unsafe.