I have a dataset as shown below. The requirement is to write a SQL query that would :
- calculate the average rec_cnt of the previous 3 dates pertaining to the latest date for a specific tbl_nm
- apply +10% to the average value determined in Step-1 above and populate the column "avg_plus_10%" in the output
- apply -10% to the average value determined in Step-1 above and populate the column "avg_minus_10%" in the output
- determine if the rec_cnt pertaining to each of those 4 dates fall BETWEEN the "avg_minus_10%" and "avg_plus_10%", (i.e. rec_cnt >= "avg_minus_10%" and rec_cnt <= "avg_plus_10%"). Then, populate TRUE if yes, else FALSE in the output column "is_matched"
Input:
tbl_nm | rundt | rec_cnt
emp | 2025/04/23| 100
emp | 2025/04/16| 125
emp | 2025/04/09| 110
emp | 2025/04/02| 135
sal | 2025/04/23| 210
sal | 2025/04/16| 230
sal | 2025/04/09| 200
sal | 2025/04/02| 215
Expected Output:
tbl_nm | rundt | rec_cnt| avg_cnt | avg_minus_10%| avg_plus_10%| is_matched
emp | 2025/04/23| 100 | 123.33 | 111.0 | 135.66 | FALSE
emp | 2025/04/16| 125 | 123.33 | 111.0 | 135.66 | TRUE
emp | 2025/04/09| 110 | 123.33 | 111.0 | 135.66 | FALSE
emp | 2025/04/02| 135 | 123.33 | 111.0 | 135.66 | TRUE
sal | 2025/04/23| 210 | 215.00 | 193.5 | 236.50 | TRUE
sal | 2025/04/16| 230 | 215.00 | 193.5 | 236.50 | TRUE
sal | 2025/04/09| 200 | 215.00 | 193.5 | 236.50 | TRUE
sal | 2025/04/02| 215 | 215.00 | 193.5 | 236.50 | TRUE
As seen above,
- the latest rundt record for 'emp' and 'sal' is '2025/04/23'
- hence, the avg(rec_cnt) is calculated to be
123.33
for the previous 3 rundt's (i.e. '2025/04/16', '2025/04/09', '2025/04/02') and populated for all the 4 records of 'emp' in the "avg_cnt" column. - "avg_minus_10%" is calculated as
avg_cnt - (10% of avg_cnt)
=>123.33 - (123.33 * 0.1)
=111.0
- "avg_plus_10%" is calculated as
avg_cnt + (10% of avg_cnt)
=>123.33 + (123.33 * 0.1)
=135.66
- Finally, the "is_matched" column is populated as 'TRUE' or 'FALSE' based on whether rec_cnt falls within the range of "avg_minus_10%" and "avg_plus_10%" or not respectively. For e.g. for "2025/04/23" of 'emp' the rec_cnt (=
100
) falls outside the range of111.0
and135.66
, hence "is_matched" is populated as 'FALSE'.
EDIT:
So far, I could come up with the following solution:-
WITH calc_avg AS (
SELECT tbl_nm,
rundt,
rec_cnt,
AVG(rec_cnt) OVER (PARTITION BY tbl_nm ORDER BY rundt DESC
ROWS BETWEEN 1 FOLLOWING AND UNBOUNDED FOLLOWING) AS avg_cnt
FROM audit_tbl
)
SELECT
tbl_nm,
rundt,
rec_cnt,
(FIRST_VALUE(avg_cnt) OVER (avg_win)) AS avg_cnt,
(FIRST_VALUE(avg_cnt) OVER (avg_win)) * 0.9 AS avg_minus_10,
(FIRST_VALUE(avg_cnt) OVER (avg_win)) * 1.1 AS avg_plus_10,
CASE WHEN rec_cnt BETWEEN ((FIRST_VALUE(avg_cnt) OVER (avg_win)) * 0.9)
AND
((FIRST_VALUE(avg_cnt) OVER (avg_win)) * 1.1)
THEN 'TRUE'
ELSE 'FALSE'
END AS is_matched
FROM calc_avg
WINDOW avg_win AS (PARTITION BY tbl_nm ORDER BY rundt DESC)
ORDER BY 1, 2 DESC;
However, I was wondering if there is a smarter way to accomplish the same.
Can someone please suggest a better SQL solution.
Thanks