In PostgreSQL 16.9 I have a table Time (duration, resourceId, date, companyId) representing timesheet entries and table Resources (id, name); I want to list sum of Time durations per week and employee name. I do:
select sum(t.duration), date_trunc('week', t.date)::date weekStart, r.name
from Time t
JOIN Resource r ON r.id=t.resourceId where t.companyId=79
group by weekStart, r.name order by weekStart desc, r.name asc limit 50;
The Time table has 15M rows (for companyId=79 cca 1M rows). There are 3 indexes on: companyId, resourceId and date columns.
The query sometimes takes up to 15s.
What can I do to speed it up? E.g. build some combined index? The calculated weekStart is used in grouping and ordering (and sometimes even filtering). Should I store the weekStart value as a new column to be able to index it?
Another idea: the big table contains entries for last 7 years. Rows with date older than 2 months are almost never listed/queried. 98% of queries are about the last 2 months (because of ORDER BY + LIMIT). Can this fact be used somehow to improve the performance?
The EXPLAIN ANALYZE (edit based on the comment) for the query:
Limit (cost=407413.34..407420.08 rows=50 width=28) (actual time=3056.141..3197.179 rows=50 loops=1)
-> GroupAggregate (cost=407413.34..555047.26 rows=1095960 width=28) (actual time=3042.669..3183.702 rows=50 loops=1)
Group Key: ((date_trunc('week'::text, (t.date)::timestamp with time zone))::date), r.name
-> Incremental Sort (cost=407413.34..548471.50 rows=1095960 width=28) (actual time=3042.650..3183.638 rows=167 loops=1)
Sort Key: ((date_trunc('week'::text, (t.date)::timestamp with time zone))::date) DESC, r.name
Presorted Key: ((date_trunc('week'::text, (t.date)::timestamp with time zone))::date)
Full-sort Groups: 4 Sort Method: quicksort Average Memory: 26kB Peak Memory: 26kB
-> Nested Loop (cost=407377.30..535659.95 rows=1095960 width=28) (actual time=3041.946..3183.537 rows=176 loops=1)
-> Gather Merge (cost=407377.22..524966.38 rows=1095960 width=16) (actual time=3041.915..3180.499 rows=176 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Sort (cost=406377.21..406605.54 rows=456650 width=16) (actual time=3001.577..3001.726 rows=932 loops=3)
Sort Key: ((date_trunc('week'::text, (t.date)::timestamp with time zone))::date) DESC
Sort Method: external merge Disk: 11368kB
Worker 0: Sort Method: external merge Disk: 11264kB
Worker 1: Sort Method: external merge Disk: 9272kB
-> Parallel Seq Scan on "time" t (cost=0.00..392216.86 rows=456650 width=16) (actual time=5.317..2791.644 rows=361877 loops=3)
Filter: (companyid = 79)
Rows Removed by Filter: 4610050
-> Memoize (cost=0.09..0.10 rows=1 width=20) (actual time=0.016..0.016 rows=1 loops=176)
Cache Key: t.resourceid
Cache Mode: logical
Hits: 153 Misses: 23 Evictions: 0 Overflows: 0 Memory Usage: 3kB
-> Index Scan using "Resource_pkey" on resource r (cost=0.08..0.09 rows=1 width=20) (actual time=0.123..0.123 rows=1 loops=23)
Index Cond: (id = t.resourceid)
Planning Time: 0.356 ms
JIT:
Functions: 29
Options: Inlining false, Optimization false, Expressions true, Deforming true
Timing: Generation 4.076 ms, Inlining 0.000 ms, Optimization 2.048 ms, Emission 26.299 ms, Total 32.423 ms
Execution Time: 3202.008 ms
explain analyzeon the query, it will run the query and give more information, among other things how long the JIT takes.work_mem.CREATE INDEX idx_time_company_resource_week ON Time(companyId, resourceId, (date_trunc('week', date)::date));