0

I have a simple query that calculates the sum of values for current row and 11 preceding rows. It works fine when the number of rows is greater than 12 , but when it is less than 12 the data gets duplicated to fill in the missing values.

TOTALS TABLE:

ID|Report_Month| Total
1 |2018-08-01 |5
2 |2018-09-01 |25
3 |2018-10-01  |15

Sample Code:

select distinct 
         ID,
         Report_Month,
         Total,
         sum(Total) over (partition by ID order by report_month rows between 11 preceding and current row) as Running_Total
from TOTALS_TABLE;

Expected Output:

ID|Report_Month|Total|Running_Total
1 | 2018-08-01 | 5 | 5
2 | 2018-09-01 | 25 | 30
3 | 2018-10-01 | 15 | 45

Actual Output:

1 | 2018-08-01 | 5 | 5
1 | 2018-08-01 | 5 | 10
1 | 2018-08-01 | 5 | 15
1 | 2018-08-01 | 5 | 20
2 | 2018-09-01 | 25 | 45
2 | 2018-09-01 | 25 | 70
2 | 2018-09-01 | 25 | 95
2 | 2018-09-01 | 25 | 120
3 | 2018-10-01 | 15 | 135
3 | 2018-10-01 | 15 | 150
3 | 2018-10-01 | 15 | 165
3 | 2018-10-01 | 15 | 180

Any help would be greatly appreciated, I feel like I am very close probably missing something.

4
  • 1
    Why is DISTINCT in there? Commented Sep 23, 2018 at 5:38
  • Please edit your post, select anything that would benefit from being a monospace font preformatted style, and press the code. Hatton above the text box. It will add 4 spaces to the start of each line and is how we make code blocks appear properly on SO.. Commented Sep 23, 2018 at 5:39
  • 1
    mysql, postgresql, and tsql are all mutually exclusive, which DBMS are you using? Commented Sep 23, 2018 at 5:40
  • I think Postgres given that he's written in the subject, the other tags might just be extra to catch attention from related professionals (please avoid doing this) who could answer. Sql tag is a better umbrella for that purpose Commented Sep 23, 2018 at 5:42

2 Answers 2

1

The clue is the select distinct. This should not be necessary. If the underlying table has duplicates, you should fix that. In the meantime, you can try to adjust the query.

I'm not sure what the right fix is. Here are two possibilities.

If entire rows are duplicated:

select ID, Report_Month, Total,
       sum(Total) over (partition by ID order by report_month rows between 11 preceding and current row) as Running_Total
from (select distinct tt.*
      from TOTALS_TABLE tt
     ) tt;

If the totals table has subtotals on each dy that need to be added up:

select ID, Report_Month,
       sum(Total) as month_total,
       sum(sum(Total)) over (partition by ID order by report_month rows between 11 preceding and current row) as Running_Total
from TOTALS_TABLE tt
group by id, Report_Month;
Sign up to request clarification or add additional context in comments.

1 Comment

I was also getting duplicate rows with the following query ``` select , min(bound_ts1) over(partition by risk_model_name order by bound_ts1 rows unbounded preceding) start_date, max(bound_ts1) over(partition by risk_model_name order by bound_ts1 rows unbounded preceding) last_date, dense_rank() over(order by risk_model_name rows unbounded preceding) rnk, (submitted_ts1-quoted_ts1) tat_quote, (quoted_ts1-bound_ts1) tat_bound from (select distinct t1. from t1) t2; ``` The first solution fixed it but I have no idea why?
0

You seem to want a query that sums over different IDs but you've told the sum to partition on ID, this means your running total will reset every time ID changes (== there is no way the query you've posted can produce the results you posted, even if Postgres is spontaneously inventing rows to provide something to summate). Remove the partition

https://www.db-fiddle.com/#&togetherjs=fw7TIVul3H

enter image description here

I didn't experience the duplicate rows problem and I can't see why adding an analytic would cause it. I think your source table or query genuinely has duplicate rows (I think your use of distinct is attempting to remove them) and the analytic is working fine. Do a

Select * from totals_table 

And check your data is ok. If it has duplicate rows you can't remove them with distinct in the manner you have, because distinct takes the result of the running total into account (and it makes each row unique). It would be better to solve the duplication issue at source, than try and distinct them out later but if you're set on doing that you'll have to do your distinct in an inner query and your running total in an outer query

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.