2

I have two tables: demo at db<>fiddle

  1. Table keywords has two columns id and v (which hold the keyword's value)
    create table keywords(
      id int generated always as identity primary key
     ,v text unique);
    
  2. Table main has, among others, two columns that are each a foreign key into the keywords table's id, named key1_id and key2_id
    create table main(
      id int generated always as identity primary key
     ,key1_id int references keywords(id)
     ,key2_id int references keywords(id));
    

Now, I want to insert pairs of keywords (key1 and key2 as $1 and $2) into the main table, like this:

WITH key1 AS (INSERT INTO keywords (v) 
              VALUES ($1) 
              ON CONFLICT (v) DO UPDATE
              SET v = EXCLUDED.v
              RETURNING id),
     key2 AS (INSERT INTO keywords (v) 
              VALUES ($2) 
              ON CONFLICT (v) DO UPDATE
              SET v = EXCLUDED.v
              RETURNING id)
INSERT INTO main (key1_id, key2_id)
SELECT key1.id, key2.id
FROM key1, key2
RETURNING id

Basically, the two keys often have recurring values, so I use the keywords table to keep a unique set of them (mostly for storage optimization, as I have millions of rows with them).

But if $1 and $2 have identical values, I get this error, whereas there's no issue if they're different:

pg_query_params(): Query failed: ERROR: ON CONFLICT DO UPDATE command cannot affect row a second time
HINT: Ensure that no rows proposed for insertion within the same command have duplicate constrained values.

The goal is that both key1_id and key2_id point to the correct row in keywords based on the values passed as $1 and $2, even if they're both the same.

How do I modify the SQL statement (ideally, it should remain a single one) so that I can insert these keys without getting the error?

I am using Postgresql 16.9.

2 Answers 2

4

You can do exactly what the hint suggests:

Ensure that no rows proposed for insertion within the same command have duplicate constrained values.

Reduced to a single insert, with a union making sure duplicate insert never happens and coalesce() just repeating key1.id in that case:
demo at db<>fiddle

WITH keys AS (INSERT INTO keywords (v) 
              SELECT $1
              UNION--deduplicates
              SELECT $2
              ON CONFLICT (v) DO UPDATE
              SET v = EXCLUDED.v
              RETURNING id)
INSERT INTO main ( key1_id, key2_id )
SELECT min(id), max(id)
FROM keys
RETURNING id;
Sign up to request clarification or add additional context in comments.

8 Comments

SELECT $2 EXCEPT SELECT $1 seems more efficient as no need to requery key1
Good point, thanks. I thought about that but 1) in some cases (bad ORM/db driver config or implementation) this would mean you're transferring $1 twice. I think most modern libraries pass/transfer that once, just bind twice, but if that's just being sanitised then interpolated into the query body, you need to consider that effect. 2) Ideally, this should be ((select $2 except select $1) except select v from key1), so both the incoming value as-is, as well as whatever comes out of key1. If there are before insert triggers, they can deviate.
3) If these really are just single-record, single-value inserts, I don't think there's much performance gain between resolving another instance of the param, compared to passing the output of key. But even if that's negligible, your select $2 except select $1 still wins by brevity. If, however, there's more stuff going on, more rows are being handled, I'd also get rid of except in favour of a not exists or some other type of an anti-join. That would also mean the comma/cross join and left join..on true need a revisit.
@Zegarek I appreciate your answer and your editing of my question, but Charlie's answer was so much more elegant, so I gave the checkmark to him. I hope you don't mind. Both your answers tought me a lot about what Pgsql can do (I'm only used to sqlite).
I do. I now asked my post to put on a top hat and tails, just for the occasion. I don't think it's an improvement but you're the master of ceremony this evening, so it's your call.
I see now that your new version is even shorter (and more comprehensive to me) than Charlie's, and since I was still able to change my checkmark, I did.
|
2

As mentioned by others, you need to ensure you aren't modifying the same row twice.

But you could simplify this a little, by combining the two INSERT INTO keywords into one with a DISTINCT (preventing the error), then aggregating for the INSERT INTO main.

WITH keys AS (
    INSERT INTO keywords (v) 
    SELECT DISTINCT v
    FROM (VALUES ($1), ($2)) AS keys(v)
    ON CONFLICT (v) DO UPDATE
    SET v = EXCLUDED.v
    RETURNING id, CASE v WHEN $1 THEN 1 ELSE 2 END AS which_one
)
INSERT INTO main (key1_id, key2_id)
SELECT
  any_value(keys.id) FILTER (WHERE keys.which_one = 1),
  COALESCE(
    any_value(keys.id) FILTER (WHERE keys.which_one = 2),
    any_value(keys.id)
  )
FROM keys
RETURNING id;

db<>fiddle

2 Comments

+1. I'd give another one just for any_value() that just doesn't get used enough and happens to be exactly within OP's reach of Pg16+. A funny way to shorten this would be to do returning id, v=$1 is_this_me and then filter(where is_this_me) and filter(where not is_this_me).
Well, I did understand that I must not modify the same row twice, but the HOW was what I had trouble with. I'm programming for over 45 years, starting with 8 bit CPU machine language, then BASIC, Pascal, C etc. But I never got used to functional languages like LISP etc. And SQL feels similarly mind-wrecking to me. I'm usually only using SQLite, which isn't as rich in its capabilities. Your solution looks quite elegant, and I'm learning quite a bit about what PG can do. But I'll probably come back asking for more next time I want to modify my script :)

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.