6

I'm working on Postgres 9.2.

There are 2 UPDATEs, each in their own transactions. One looks like:

UPDATE foo SET a=1 WHERE b IN (1,2,3,4);

The other is similar:

UPDATE foo SET a=2 WHERE b IN (1,2,3,4);

These could possibly run at the same time and in reality have 500+ in the 'IN' expression. I'm sometimes seeing deadlocks. Is is true that that order of items in the 'IN' expression may not actually influence the true lock ordering?

2 Answers 2

13

There is no ORDER BY in the UPDATE command.
But there is for SELECT. Use row-level locking with the FOR UPDATE clause in a subquery:

UPDATE foo f
SET    a = 1
FROM  (
   SELECT b FROM foo
   WHERE  b IN (1,2,3,4)
   ORDER  BY b
   FOR    UPDATE
   ) upd
WHERE f.b = upd.b;

More basics in the chapter "Deadlocks" in the manual.

Of course, b has to be UNIQUE, or you need to add more expressions to the ORDER BY clause to make the sort order deterministic.

And you need to enforce the same order for all statements that take blocking row-level locks: UPDATE, DELETE, SELECT .. FOR UPDATE, and MERGE (Postgres 15+).

Related, with more details:

Sign up to request clarification or add additional context in comments.

2 Comments

Is a basic UPDATE ... WHERE b IN (...) always safe from deadlocks if b is a primary key or not even then? I'm surprised I haven't been able to find more explicit documentation of when it's safe to do a basic bulk update.
@Andy, in the given example, b is the PK (or some other unique key). No, that does not get you out of trouble. There is explicit documentation, I added a link.
6

Yes. I think the main issue here is that IN checks for membership in the set specified, but does not confer any sort of ordering on the UPDATE, which in turn, means that no concrete ordering is conferred upon the lock ordering.

The WHERE clause in an UPDATE statement essentially behaves the same way it does in a SELECT. For example, I will often simulate an UPDATE using a SELECT to check what will be updated to see that it's what I expected.

With that in mind, the following example using SELECT demonstrates that IN does not in itself confer ordering:

Given this schema/data:

create table foo
(
  id serial,
  val text
);

insert into foo (val)
values ('one'), ('two'), ('three'), ('four');

The following queries:

select *
from foo
where id in (1,2,3,4);


select *
from foo
where id in (4,3,2,1);

yield the exact same results -- the rows in order from id 1-4.

Even that isn't guaranteed, since I did not use an ORDER BY in the select. Rather, without it, Postgres uses whatever order the server decides is fastest (see point 8 about ORDER BY in the Postgres SELECT doc). Given a fairly static table, it's often the same order in which it was inserted (as was the case here). However, there's nothing guaranteeing that, and if there's a lot of churn on the table (lots of dead tuples, rows removed, etc.), it's less likely to be the case.

I suspect that's what's happening here with your UPDATE. Sometimes -- if not even most of the time -- it may end up in numerical order if that's the same way the rows were inserted, but there's nothing to guarantee that, and the cases where you see the deadlocks are likely scenarios where the data has changed such that one update is ordered different than the other.

sqlfiddle with the above code.

Possible fixes/workarounds:

In terms of what you could do about it, there are various options, depending on your requirements. You could explicitly take out a table lock on the table, although that would of course have the effect of serializing the updates there, which may prove to be too large a bottleneck.

Another option, which would still allow for concurrency -- is to explicitly iterate over the items using dynamic SQL in, say, Python. That way, you'd have a set of one-row updates that occurred always in the same order, and since you could ensure that consistent order, the normal Postgres locking should be able to handle the concurrency without deadlocking.

That won't perform as well as batch-updating in pure SQL, but it should solve the lock issue. One suggestion to bump up performance is to only COMMIT every so often, and not after every single row -- that saves a lot of overhead.

Another option would be to do the loop in a Postgres function written in PL/pgSQL. That function could then be called externally, in, say, Python, but the looping would be done (also explicitly) server-side, which may save on some overhead, since the looping and UPDATEs are done entirely server-side without having to go over the wire each loop iteration.

4 Comments

Unfortunately, PostgreSQL doesn't (yet) provide an UPDATE ... ORDER BY, which is what we'd need to guarantee this.
@Craig: The misfortune is of limited reach, since SELECT .. ORDER BY .. FOR UPDATE in a subquery achieves the same.
@ErwinBrandstetter Yep, as the LockRows node occurs outside the Sort node, that should be fine :-) . Nonetheless, it's a pain that users have to deal with this.
@Craig: Not saying that an ORDER BY clause for UPDATE (and DELETE) wouldn't be welcome ...

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.