1

This is a follow-up to:

Based on these sample tables:

data_providers:
id | field_map
--------------
1  | {"segments": "SEGMENT IDS", "full_name": "FULL NAME"}

leads:
id | data_provider_id | email | data
------------------------------------
1  | 201              | hi@hi | {"SEGMENT IDS": "id1,id1,id1,id2,id3", "FULL NAME": "John Doe"}
2  | 201              | xx@xx | {"FULL NAME": "Billy Bob"}

desired output:

data_provider_id | email | full_name | segment
----------------------------------------------
201              | hi@hi | John Doe  | id1
201              | hi@hi | John Doe  | id2
201              | hi@hi | John Doe  | id3
201              | xx@xx | Billy Bob | NULL

I have the following query:

SELECT
  leads.data_provider_id,
  leads.email,
  leads.data->>(p.field_map->>'full_name') AS full_name,
  segment
FROM leads
LEFT OUTER JOIN data_providers p ON p.id = leads.data_provider_id
LEFT JOIN LATERAL unnest(string_to_array(leads.data->>(p.field_map->>'segments'), ',')) AS segment ON true

This query is doing 2 particular things:

  1. its joining on data_providers table to get the field_map column which contains a JSONB mapping if CSV column headers. So something like {"segments": "SEGMENT ID", "full_name": "FULL NAME"}

  2. Within the data JSONB column of leads, there is a key (which I discover through the field map above) that contains a comma separated string of segment_ids (it comes in a CSV and they chose to put 2 values within 1 row). I want to split it so each segment_id gets its own row (and obviously all other columns remain the same on both rows).

I have 2 goals:

  1. If there is an empty string or the key doesn't exist within the map, I want to return the row but just with NULL for the segment_id. I already got this working by changing CROSS JOIN to LEFT JOIN.

  2. I'm trying to remove duplicates in segment ids, so if someone enters 'id1,id1' it should only produce 1 row. I do this because there is a unique index on that column for the materialized view.

I'm currently stuck on #2.

1
  • 2
    I think you will have a better chance of getting answer(s) if you provide a dbfiddle.uk/?rdbms=postgres_14. Most people will just skip to the next question as soon as they realize they will have to create table- and insert- statement's Commented Mar 14, 2022 at 19:45

1 Answer 1

2

Make it a subquery and throw in DISTINCT:

SELECT l.data_provider_id
     , l.email
     , l.data->>(p.field_map->>'full_name') AS full_name
     , s.segment
FROM   leads l
LEFT   JOIN data_providers p ON p.id = l.data_provider_id
LEFT   JOIN LATERAL (
   SELECT DISTINCT segment
   FROM   unnest(string_to_array(l.data->>(p.field_map->>'segment'), ',')) AS segment
   ) s ON true

Your field_map holds the key 'segment', not 'segments', btw.

You could even use this short syntax:

...
LEFT   JOIN LATERAL (
   SELECT DISTINCT unnest(string_to_array(l.data->>(p.field_map->>'segment'), ','))
   ) s(segment) ON true

(But the last one might make unsuspecting SQL purists cringe.)

Original order of array elements is not preserved. If you need that, see:

And use GROUP BY rather than DISTINCT and also aggregate the minimum ordinal position for each group of duplicates.

2
  • 1
    @Tallboy: I would have tested my solution if you had provided a fiddle ... Commented Mar 15, 2022 at 2:39
  • Awesome thank you! I will also provide fiddle if I can't get this working. You are the Postgres demigod Commented Mar 15, 2022 at 18:44

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.