2

I'd like to remove duplicates from the column 'value' but only if there was no change from the previous update. I read tutorials about lag and lead but couldn't find an example with removing duplicates.

Original:

+----+-------+-------+------------------------+
| ID | subID | value |       updated_at       |
+----+-------+-------+------------------------+
|  1 |     2 | 2.20  | 2020-02-16 07:36:25+01 |
|  1 |     2 | 2.20  | 2020-02-16 07:31:25+01 |
|  1 |     2 | 2.20  | 2020-02-16 07:26:25+01 |
|  1 |     2 | 2.30  | 2020-02-16 07:21:25+01 |
|  1 |     2 | 2.20  | 2020-02-16 07:16:25+01 |
|  1 |     2 | 2.20  | 2020-02-16 07:11:25+01 |
+----+-------+-------+------------------------+

Desired output:

+----+-------+-------+------------------------+
| ID | subID | value |       updated_at       |
+----+-------+-------+------------------------+
|  1 |     2 | 2.20  | 2020-02-16 07:36:25+01 |
|  1 |     2 | 2.30  | 2020-02-16 07:21:25+01 |
|  1 |     2 | 2.20  | 2020-02-16 07:16:25+01 | 
+----+-------+-------+------------------------+
1
  • Your ID/subid don't look like what I'd expect an ID to look like! What's with all the duplicates? Commented Jun 21, 2020 at 20:07

2 Answers 2

2

I'd use lag or lead and remove by ctid:

DELETE FROM yourtable WHERE ctid IN
(
  SELECT
    ctid
  FROM 
  (
    SELECT 
      ctid,
      value,
      LAG(value) OVER(PARTITION BY id, subid ORDER BY updated_at) pre
    FROM 
      yourtable t
  ) t
  WHERE value = pre 
)

As with any delete query from the internet, run it against a copy of the table...

Sign up to request clarification or add additional context in comments.

Comments

1

This is a gaps-and-island problem. If you want the last row before earch value change, you can use lead():

select *
from (
    select 
        t.*, 
        lead(value) over(partition by id, sub_id order by updated_at) next_value
    from mytable t
) t
where value <> next_value or next_value is null

On the other hand if you want first value after each value change, you can use lag() instead of lead() (the rest of the query should remain the same).

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.