0

Nullify/Modify values in rows of data, based on matching values in multiple columns.

This question is about manipulating row data in only certain columns, baesd on matching attributes in other columns. Given set has data pertaining to an item, with respective itemIDs, it's time_period (month / ATTR2), people associated to it (person name), and person's role (ATTR1).

Image with Sample data for easier understanding-

When a person bears multiple attributes for a particular item, in a particular month, only one row of data should have valid numbers and the rest of his entries for the above match combination, should be zero. The row with valid data should be chosen by a ranking, of the ATTR1 field. Ex: Ash plays the roles R1-R6 and has 6 rows of data, for the item 900001, for the month JAN-21. Ash should have valid data only in the row belonging to ATTR1 = R1. The rest of the rows should be updated to zeros or nulls.

edit !!: R1 R2 may not exist all the time. it could just be R7, R9 or R4, R6 sometimes.

MRE:

with tbl ( ID   ,ATTR2  ,Name   ,ATTR1  ,   ATTR1_RANK  ,Col1   ,Col2   ,Col3   ,Col4 ) as   (

select  '900001'    ,'JAN-21'   ,'Ash'  ,'R1',  1   ,100    ,200    ,300    ,50     from dual union
select  '900001'    ,'JAN-21'   ,'Ash'  ,'R2',  2   ,100    ,200    ,300    ,50     from dual union
select  '900001'    ,'JAN-21'   ,'Ash'  ,'R3',  3   ,100    ,200    ,300    ,50     from dual union
select  '900001'    ,'JAN-21'   ,'Ash'  ,'R4',  4   ,100    ,200    ,300    ,50     from dual union
select  '900001'    ,'JAN-21'   ,'Ash'  ,'R5',  5   ,100    ,200    ,300    ,50     from dual union
select  '900001'    ,'JAN-21'   ,'Ash'  ,'R6',  6   ,100    ,200    ,300    ,50     from dual union
select  '790001'    ,'JUN-22'   ,'Jon'  ,'R1',  1   ,9900   ,-91    ,-91    ,8181       from dual union
select  '790001'    ,'JUN-22'   ,'Jon'  ,'R2',  2   ,9900   ,-91    ,-91    ,8181       from dual union
select  '790001'    ,'AUG-22'   ,'Jon'  ,'R1',  1   ,9900   ,-91    ,-91    ,8181       from dual union
select  '790001'    ,'AUG-22'   ,'Jon'  ,'R2',  2   ,9900   ,-91    ,-91    ,8181       from dual  )

select * from tbl ;

My implementation was along these lines: select only the rows with value > 1 for : count(rows) over ( partition by ID, ATTR2, Name ) , among that data select data with value greater than 1 for expression: count(*) over ( partition by ID, ATTR2, Name order by ATTR1_RANK ) and make those rows 0.

identification of the rows meant to be modified is quite okay for me but a clean way to merge/update this data, that is where i have less idea and i am failing. also. i wanted to write an efficient algorithm for this when performing this over millions of rows of data.

enter image description here

1
  • Just turn the attributes to 0/null based on attr1_rank via case/decode: case attr1_rank when 1 then col1 end
    – astentx
    Commented May 29, 2023 at 14:45

2 Answers 2

0

use the case statement to achieve this..

select ID,
    ATTR2,
    Name,
    ATTR1,
    ATTR1_RANK,
    CASE WHEN ATTR1 != 'R1' THEN 0 ELSE Col1 END AS Col1,
    CASE WHEN ATTR1 != 'R1' THEN 0 ELSE Col2 END AS Col2,
    CASE WHEN ATTR1 != 'R1' THEN 0 ELSE Col3 END AS Col3,
    CASE WHEN ATTR1 != 'R1' THEN 0 ELSE Col4 END AS Col4

from tbl ;
1
  • sorry! perhaps i was not clear enough. R1 is just an example, R1 may not exist all the times. sometimes r3, 4, 5 may be present for an ID, at other times, r7, 8, 9 could be present. i think that hardcoded case statement above wouldnt help me! also.. i am not clear if you are just answering a part of the problem. just searching for attr != 'R1' isnt the goal. the goal is to identify items with same name, id, month !! we just want to affect those records, not all the records! Commented Jun 8, 2023 at 14:28
0

You can use row_number() analytic function to detected top-ranked rows in your partition. Then use CASE-expressions to filter col1..col4-columns values only for top-ranked rows:

with tbl ( ID   ,ATTR2  ,Name   ,ATTR1  ,   ATTR1_RANK  ,Col1   ,Col2   ,Col3   ,Col4 ) as   (

select  '900001'    ,'JAN-21'   ,'Ash'  ,'R1',  1   ,100    ,200    ,300    ,50     from dual union
select  '900001'    ,'JAN-21'   ,'Ash'  ,'R2',  2   ,100    ,200    ,300    ,50     from dual union
select  '900001'    ,'JAN-21'   ,'Ash'  ,'R3',  3   ,100    ,200    ,300    ,50     from dual union
select  '900001'    ,'JAN-21'   ,'Ash'  ,'R4',  4   ,100    ,200    ,300    ,50     from dual union
select  '900001'    ,'JAN-21'   ,'Ash'  ,'R5',  5   ,100    ,200    ,300    ,50     from dual union
select  '900001'    ,'JAN-21'   ,'Ash'  ,'R6',  6   ,100    ,200    ,300    ,50     from dual union
select  '790001'    ,'JUN-22'   ,'Jon'  ,'R1',  1   ,9900   ,-91    ,-91    ,8181       from dual union
select  '790001'    ,'JUN-22'   ,'Jon'  ,'R2',  2   ,9900   ,-91    ,-91    ,8181       from dual union
select  '790001'    ,'AUG-22'   ,'Jon'  ,'R1',  1   ,9900   ,-91    ,-91    ,8181       from dual union
select  '790001'    ,'AUG-22'   ,'Jon'  ,'R2',  2   ,9900   ,-91    ,-91    ,8181       from dual  
)
select 
  ID, ATTR2, NAME, ATTR1, ATTR1_RANK, 
  case when rn = 1 then Col1 else 0 end as Col1,
  case when rn = 1 then Col2 else 0 end as Col2,
  case when rn = 1 then Col3 else 0 end as Col3,
  case when rn = 1 then Col4 else 0 end as Col4
from (
  select 
    t.*, 
    row_number() over( partition by ID, ATTR2, Name order by ATTR1_RANK) as rn 
  from tbl t
)

Result:

ID ATTR2 NAME ATTR1 ATTR1_RANK COL1 COL2 COL3 COL4
790001 AUG-22 Jon R1 1 9900 -91 -91 8181
790001 AUG-22 Jon R2 2 0 0 0
790001 JUN-22 Jon R1 1 9900 -91 -91 8181
790001 JUN-22 Jon R2 2 0 0 0 0
900001 JAN-21 Ash R1 1 100 200 300 50
900001 JAN-21 Ash R2 2 0 0 0 0
900001 JAN-21 Ash R3 3 0 0 0 0
900001 JAN-21 Ash R4 4 0 0 0 0
900001 JAN-21 Ash R5 5 0 0 0 0
900001 JAN-21 Ash R6 6 0 0 0 0

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.