0

I have a dataset as shown below. The requirement is to write a SQL query that would :

  1. calculate the average rec_cnt of the previous 3 dates pertaining to the latest date for a specific tbl_nm
  2. apply +10% to the average value determined in Step-1 above and populate the column "avg_plus_10%" in the output
  3. apply -10% to the average value determined in Step-1 above and populate the column "avg_minus_10%" in the output
  4. determine if the rec_cnt pertaining to each of those 4 dates fall BETWEEN the "avg_minus_10%" and "avg_plus_10%", (i.e. rec_cnt >= "avg_minus_10%" and rec_cnt <= "avg_plus_10%"). Then, populate TRUE if yes, else FALSE in the output column "is_matched"

Input:

tbl_nm | rundt     | rec_cnt
emp    | 2025/04/23| 100
emp    | 2025/04/16| 125
emp    | 2025/04/09| 110
emp    | 2025/04/02| 135
sal    | 2025/04/23| 210
sal    | 2025/04/16| 230
sal    | 2025/04/09| 200
sal    | 2025/04/02| 215

Expected Output:

tbl_nm | rundt     | rec_cnt| avg_cnt | avg_minus_10%| avg_plus_10%| is_matched
emp    | 2025/04/23| 100    | 123.33  |      111.0   |      135.66 | FALSE
emp    | 2025/04/16| 125    | 123.33  |      111.0   |      135.66 | TRUE
emp    | 2025/04/09| 110    | 123.33  |      111.0   |      135.66 | FALSE
emp    | 2025/04/02| 135    | 123.33  |      111.0   |      135.66 | TRUE
sal    | 2025/04/23| 210    | 215.00  |      193.5   |      236.50 | TRUE
sal    | 2025/04/16| 230    | 215.00  |      193.5   |      236.50 | TRUE
sal    | 2025/04/09| 200    | 215.00  |      193.5   |      236.50 | TRUE
sal    | 2025/04/02| 215    | 215.00  |      193.5   |      236.50 | TRUE

As seen above,

  • the latest rundt record for 'emp' and 'sal' is '2025/04/23'
  • hence, the avg(rec_cnt) is calculated to be 123.33 for the previous 3 rundt's (i.e. '2025/04/16', '2025/04/09', '2025/04/02') and populated for all the 4 records of 'emp' in the "avg_cnt" column.
  • "avg_minus_10%" is calculated as avg_cnt - (10% of avg_cnt) => 123.33 - (123.33 * 0.1) = 111.0
  • "avg_plus_10%" is calculated as avg_cnt + (10% of avg_cnt) => 123.33 + (123.33 * 0.1) = 135.66
  • Finally, the "is_matched" column is populated as 'TRUE' or 'FALSE' based on whether rec_cnt falls within the range of "avg_minus_10%" and "avg_plus_10%" or not respectively. For e.g. for "2025/04/23" of 'emp' the rec_cnt (=100) falls outside the range of 111.0 and 135.66, hence "is_matched" is populated as 'FALSE'.

EDIT:

So far, I could come up with the following solution:-

WITH calc_avg AS (
SELECT tbl_nm,
       rundt,
       rec_cnt,
       AVG(rec_cnt) OVER (PARTITION BY tbl_nm ORDER BY rundt DESC
                          ROWS BETWEEN 1 FOLLOWING AND UNBOUNDED FOLLOWING) AS avg_cnt
FROM audit_tbl
)
SELECT
    tbl_nm,
    rundt,
    rec_cnt,
    (FIRST_VALUE(avg_cnt) OVER (avg_win)) AS avg_cnt,
    (FIRST_VALUE(avg_cnt) OVER (avg_win)) * 0.9 AS avg_minus_10,
    (FIRST_VALUE(avg_cnt) OVER (avg_win)) * 1.1 AS avg_plus_10,
    CASE WHEN rec_cnt BETWEEN ((FIRST_VALUE(avg_cnt) OVER (avg_win)) * 0.9) 
                              AND
                              ((FIRST_VALUE(avg_cnt) OVER (avg_win)) * 1.1)
         THEN 'TRUE'
         ELSE 'FALSE'
    END AS is_matched
FROM calc_avg
WINDOW avg_win AS (PARTITION BY tbl_nm ORDER BY rundt DESC)
ORDER BY 1, 2 DESC;

However, I was wondering if there is a smarter way to accomplish the same.

Can someone please suggest a better SQL solution.

Thanks

2
  • Hi - you haven’t actually asked a question. What’s the specific issue you need help with?
    – NickW
    Commented yesterday
  • @NickW - I have provided a probable solution in the question above (EDIT). Can you please suggest a smarter solution for the same ?
    – marie20
    Commented yesterday

2 Answers 2

0

You need to:

Find the latest date for each tbl_nm.

Calculate the average rec_cnt for the previous 3 dates.

Apply ±10% to get thresholds.

Evaluate each row against the thresholds and outputs the result.

Try this:

WITH ranked_data AS (
  SELECT 
    tbl_nm,
    rundt,
    rec_cnt,
    ROW_NUMBER() OVER (PARTITION BY tbl_nm ORDER BY rundt DESC) AS rn
  FROM your_table
),

-- Get the latest date per tbl_nm
latest_dates AS (
  SELECT tbl_nm, rundt AS latest_rundt
  FROM ranked_data
  WHERE rn = 1
),

-- Calculate avg for previous 3 dates (excluding latest)
avg_last_3 AS (
  SELECT 
    tbl_nm,
    ROUND(AVG(rec_cnt)::numeric, 2) AS avg_cnt
  FROM ranked_data
  WHERE rn > 1 AND rn <= 4  -- rows 2, 3, 4 => last 3 before latest
  GROUP BY tbl_nm
),

-- Join everything together
final_output AS (
  SELECT 
    r.tbl_nm,
    r.rundt,
    r.rec_cnt,
    a.avg_cnt,
    ROUND(a.avg_cnt * 0.9, 2) AS avg_minus_10,
    ROUND(a.avg_cnt * 1.1, 2) AS avg_plus_10,
    CASE 
      WHEN r.rec_cnt BETWEEN ROUND(a.avg_cnt * 0.9, 2) AND ROUND(a.avg_cnt * 1.1, 2)
      THEN TRUE
      ELSE FALSE
    END AS is_matched
  FROM ranked_data r
  JOIN avg_last_3 a ON r.tbl_nm = a.tbl_nm
)

SELECT * FROM final_output
ORDER BY tbl_nm, rundt DESC;
1
  • thanks for your reply. I have recently added a probable solution in the question (EDIT). I am looking for a smarter approach to the same. Can you please take a look at the solution in the question and suggest a smarter solution to that ?
    – marie20
    Commented yesterday
0

This answer is a pretty much straightforward solution (folowing your directions) that uses conditional aggregations with AVG() Over() analytic functionsto get the result. Not sure what do you consider as a smarter approach, though.

WITH     -- fetch last date per each tabnm - used as the condition in AVG() Over() analytic funck
  last_dts AS
    ( Select    t.tabnm, Max(rundt) as lastdt
      From      tbl t
      Group By  t.tabnm 
    )
--      M a i n    S Q L :
Select      t.tabnm,  t.rundt, rec_cnt, 
            Round( AVG(Case When rundt < lastdt Then rec_cnt End) Over( Partition By t.tabnm), 2) as avg_cnt, 
            Round( AVG(Case When rundt < lastdt Then rec_cnt * 1.1 End) Over( Partition By t.tabnm), 2) as avg_plus_10,
            Round( AVG(Case When rundt < lastdt Then rec_cnt * 0.9 End) Over( Partition By t.tabnm), 2) as avg_minus_10, 
            --
            Case When rec_cnt Between 
                                 Round( AVG(Case When rundt < lastdt Then rec_cnt * 0.9 End) Over( Partition By t.tabnm), 2)
                          And    Round( AVG(Case When rundt < lastdt Then rec_cnt * 1.1 End) Over( Partition By t.tabnm), 2)                 
                 Then 'TRUE'
            Else 'FALSE'
            End as is_matched
From        tbl t 
Inner Join  last_dts l ON( l.tabnm = t.tabnm )     

tabnm rundt rec_cnt avg_cnt avg_plus_10 avg_minus_10 is_matched
emp 2025-04-23 100 123.33 135.67 111.00 FALSE
emp 2025-04-16 125 123.33 135.67 111.00 TRUE
emp 2025-04-09 110 123.33 135.67 111.00 FALSE
emp 2025-04-02 135 123.33 135.67 111.00 TRUE
sal 2025-04-23 210 215.00 236.50 193.50 TRUE
sal 2025-04-16 230 215.00 236.50 193.50 TRUE
sal 2025-04-09 200 215.00 236.50 193.50 TRUE
sal 2025-04-02 215 215.00 236.50 193.50 TRUE

fiddle

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.