Postgres Rows Between Causing Duplicates

Question

I have a simple query that calculates the sum of values for current row and 11 preceding rows. It works fine when the number of rows is greater than 12 , but when it is less than 12 the data gets duplicated to fill in the missing values.

TOTALS TABLE:

ID|Report_Month| Total
1 |2018-08-01 |5
2 |2018-09-01 |25
3 |2018-10-01  |15

Sample Code:

select distinct 
         ID,
         Report_Month,
         Total,
         sum(Total) over (partition by ID order by report_month rows between 11 preceding and current row) as Running_Total
from TOTALS_TABLE;

Expected Output:

ID|Report_Month|Total|Running_Total
1 | 2018-08-01 | 5 | 5
2 | 2018-09-01 | 25 | 30
3 | 2018-10-01 | 15 | 45

Actual Output:

1 | 2018-08-01 | 5 | 5
1 | 2018-08-01 | 5 | 10
1 | 2018-08-01 | 5 | 15
1 | 2018-08-01 | 5 | 20
2 | 2018-09-01 | 25 | 45
2 | 2018-09-01 | 25 | 70
2 | 2018-09-01 | 25 | 95
2 | 2018-09-01 | 25 | 120
3 | 2018-10-01 | 15 | 135
3 | 2018-10-01 | 15 | 150
3 | 2018-10-01 | 15 | 165
3 | 2018-10-01 | 15 | 180

Any help would be greatly appreciated, I feel like I am very close probably missing something.

Please edit your post, select anything that would benefit from being a monospace font preformatted style, and press the code. Hatton above the text box. It will add 4 spaces to the start of each line and is how we make code blocks appear properly on SO.. — Caius Jard
– Caius Jard, Commented Sep 23, 2018 at 5:39
mysql, postgresql, and tsql are all mutually exclusive, which DBMS are you using? — gomory-chvatal
– gomory-chvatal, Commented Sep 23, 2018 at 5:40
I think Postgres given that he's written in the subject, the other tags might just be extra to catch attention from related professionals (please avoid doing this) who could answer. Sql tag is a better umbrella for that purpose — Caius Jard
– Caius Jard, Commented Sep 23, 2018 at 5:42

Gordon Linoff · Accepted Answer · 2018-09-23 11:30:29Z

1

The clue is the select distinct. This should not be necessary. If the underlying table has duplicates, you should fix that. In the meantime, you can try to adjust the query.

I'm not sure what the right fix is. Here are two possibilities.

If entire rows are duplicated:

select ID, Report_Month, Total,
       sum(Total) over (partition by ID order by report_month rows between 11 preceding and current row) as Running_Total
from (select distinct tt.*
      from TOTALS_TABLE tt
     ) tt;

If the totals table has subtotals on each dy that need to be added up:

select ID, Report_Month,
       sum(Total) as month_total,
       sum(sum(Total)) over (partition by ID order by report_month rows between 11 preceding and current row) as Running_Total
from TOTALS_TABLE tt
group by id, Report_Month;

answered Sep 23, 2018 at 11:30

Gordon Linoff

1.3m62 gold badges706 silver badges857 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Antonio Over a year ago

I was also getting duplicate rows with the following query ``` select , min(bound_ts1) over(partition by risk_model_name order by bound_ts1 rows unbounded preceding) start_date, max(bound_ts1) over(partition by risk_model_name order by bound_ts1 rows unbounded preceding) last_date, dense_rank() over(order by risk_model_name rows unbounded preceding) rnk, (submitted_ts1-quoted_ts1) tat_quote, (quoted_ts1-bound_ts1) tat_bound from (select distinct t1. from t1) t2; ``` The first solution fixed it but I have no idea why?

Caius Jard · Accepted Answer · 2018-09-23 06:27:18Z

You seem to want a query that sums over different IDs but you've told the sum to partition on ID, this means your running total will reset every time ID changes (== there is no way the query you've posted can produce the results you posted, even if Postgres is spontaneously inventing rows to provide something to summate). Remove the partition

https://www.db-fiddle.com/#&togetherjs=fw7TIVul3H

enter image description here

I didn't experience the duplicate rows problem and I can't see why adding an analytic would cause it. I think your source table or query genuinely has duplicate rows (I think your use of distinct is attempting to remove them) and the analytic is working fine. Do a

Select * from totals_table

And check your data is ok. If it has duplicate rows you can't remove them with distinct in the manner you have, because distinct takes the result of the running total into account (and it makes each row unique). It would be better to solve the duplication issue at source, than try and distinct them out later but if you're set on doing that you'll have to do your distinct in an inner query and your running total in an outer query

Collectives™ on Stack Overflow

Postgres Rows Between Causing Duplicates

2 Answers 2

1 Comment

Comments

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Related