0

In PostgreSQL 16.9 I have a table Time (duration, resourceId, date, companyId) representing timesheet entries and table Resources (id, name); I want to list sum of Time durations per week and employee name. I do:

select sum(t.duration), date_trunc('week', t.date)::date weekStart, r.name 
from Time t 
JOIN Resource r ON r.id=t.resourceId where t.companyId=79
group by weekStart, r.name order by weekStart desc, r.name asc limit 50;

The Time table has 15M rows (for companyId=79 cca 1M rows). There are 3 indexes on: companyId, resourceId and date columns.

The query sometimes takes up to 15s.

What can I do to speed it up? E.g. build some combined index? The calculated weekStart is used in grouping and ordering (and sometimes even filtering). Should I store the weekStart value as a new column to be able to index it?

Another idea: the big table contains entries for last 7 years. Rows with date older than 2 months are almost never listed/queried. 98% of queries are about the last 2 months (because of ORDER BY + LIMIT). Can this fact be used somehow to improve the performance?

The EXPLAIN ANALYZE (edit based on the comment) for the query:

Limit  (cost=407413.34..407420.08 rows=50 width=28) (actual time=3056.141..3197.179 rows=50 loops=1)
  ->  GroupAggregate  (cost=407413.34..555047.26 rows=1095960 width=28) (actual time=3042.669..3183.702 rows=50 loops=1)
        Group Key: ((date_trunc('week'::text, (t.date)::timestamp with time zone))::date), r.name
        ->  Incremental Sort  (cost=407413.34..548471.50 rows=1095960 width=28) (actual time=3042.650..3183.638 rows=167 loops=1)
              Sort Key: ((date_trunc('week'::text, (t.date)::timestamp with time zone))::date) DESC, r.name
              Presorted Key: ((date_trunc('week'::text, (t.date)::timestamp with time zone))::date)
              Full-sort Groups: 4  Sort Method: quicksort  Average Memory: 26kB  Peak Memory: 26kB
              ->  Nested Loop  (cost=407377.30..535659.95 rows=1095960 width=28) (actual time=3041.946..3183.537 rows=176 loops=1)
                    ->  Gather Merge  (cost=407377.22..524966.38 rows=1095960 width=16) (actual time=3041.915..3180.499 rows=176 loops=1)
                          Workers Planned: 2
                          Workers Launched: 2
                          ->  Sort  (cost=406377.21..406605.54 rows=456650 width=16) (actual time=3001.577..3001.726 rows=932 loops=3)
                                Sort Key: ((date_trunc('week'::text, (t.date)::timestamp with time zone))::date) DESC
                                Sort Method: external merge  Disk: 11368kB
                                Worker 0:  Sort Method: external merge  Disk: 11264kB
                                Worker 1:  Sort Method: external merge  Disk: 9272kB
                                ->  Parallel Seq Scan on "time" t  (cost=0.00..392216.86 rows=456650 width=16) (actual time=5.317..2791.644 rows=361877 loops=3)
                                      Filter: (companyid = 79)
                                      Rows Removed by Filter: 4610050
                    ->  Memoize  (cost=0.09..0.10 rows=1 width=20) (actual time=0.016..0.016 rows=1 loops=176)
                          Cache Key: t.resourceid
                          Cache Mode: logical
                          Hits: 153  Misses: 23  Evictions: 0  Overflows: 0  Memory Usage: 3kB
                          ->  Index Scan using "Resource_pkey" on resource r  (cost=0.08..0.09 rows=1 width=20) (actual time=0.123..0.123 rows=1 loops=23)
                                Index Cond: (id = t.resourceid)
Planning Time: 0.356 ms
JIT:
  Functions: 29
  Options: Inlining false, Optimization false, Expressions true, Deforming true
  Timing: Generation 4.076 ms, Inlining 0.000 ms, Optimization 2.048 ms, Emission 26.299 ms, Total 32.423 ms
Execution Time: 3202.008 ms
7
  • please run explain analyze on the query, it will run the query and give more information, among other things how long the JIT takes. Commented Aug 12 at 11:23
  • I've replaced the Explain output with Explain Analyze as requested by the @jkavalik above. The execution is here much faster (the DB is not in production load now) Commented Aug 12 at 11:44
  • 1
    Show us the table DDL as well as indexes please! Commented Aug 12 at 13:33
  • See if you can give it more work_mem. Commented Aug 12 at 14:04
  • Currently there is no useful index on "Time" (terrible name...), try something like this: CREATE INDEX idx_time_company_resource_week ON Time(companyId, resourceId, (date_trunc('week', date)::date)); Commented Aug 12 at 18:12

2 Answers 2

0
  1. If only data worth latest two months are being considered all the time, partitioning the table is a good idea.
  2. The execution plan shows a parallel sequential scan when the filter t.companyId=$1 is being processed, this is the longest and costliest node. This indicates lack of index on companyID at least. How about a composite index like this: create index concurrently index_name on time(resourceId, date desc, companyID);
  3. Why this index - resourceID is included in the index because of the join condition, date because there is an order by clause on it in desc, and finally companyID has a filter on it.
  4. Even after using this index, if you still see 'external merge Disk', you could try setting a higher work_mem at query level using set local work_mem='8MB';
0

If you do have an index on "time".companyid, try running the query with enable_seqscan = off. If that is faster, the estimates are off. Most likely, random_page_cost is set too high. See if a reduced value gives you the faster plan.

If the index scan turns out to be slower, please set track_io_timing = on and run EXPLAIN (ANALYZE, BUFFERS) for a deeper analysis.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.