0

I'll say in advance that I've seen similar questions on this topic, but I just couldn't find an answer because they concerned long queries, while mine, on the contrary, executes the query in less than one second, but every few hours this error occurs:

SQLSTATE[40001]: Serialization failure: 7 ERROR: canceling statement due to conflict with recovery LINE 27: FROM sometable st ^ DETAIL: User query might have needed to see row versions that must be removed.

After studying this topic, I don't want to use hot_standby_feedback = on, because it can lead to bloat of the primary, and we have very frequent deletions, inserts and updates. But in general I don't think that this should cause a problem.

I tried changing the parameters max_standby_archive_delay and max_standby_streaming_delay from 30s to 60s, but, in my opinion, the error began to appear even more often...

While I changed it to 15s, the result is unclear, it seems to have remained at the same level, but I don't see a cause-and-effect relationship...

And I suspect that the reason is in pgbouncer, which I use in front of the replica for load balancing, but I can't find anything similar on the net on this topic, so I decided to ask here, maybe someone has ideas.

And it's strange that in the code I catch this error, wait a second and re-make the request, but it fails with the same error a second time.

Below I will attach the configuration of one of the pgbouncers:

[pgbouncer]
logfile = /var/log/pgbouncer/pgbouncer.log
pidfile = /run/pgbouncer/pgbouncer.pid

;; ip address or * which means all ip-s
;listen_addr = 127.0.0.1
listen_port = 5435

unix_socket_dir = /srv/sockets/pgbouncer
unix_socket_mode = 0777

; any, trust, plain, crypt, md5
auth_type = any
auth_file = /etc/pgbouncer/userlist.txt

; comma-separated list of users, who are allowed to change settings
admin_users = postgres

; comma-separated list of users who are just allowed to use SHOW command
stats_users = stats, postgres

; total number of clients that can connect
max_client_conn = 10000

; default pool size.  20 is good number when transaction pooling
; is in use, in session pooling it needs to be the number of
; max clients you want to handle at any moment
;default_pool_size = 500
default_pool_size = 30

query_wait_timeout = 60
server_idle_timeout = 15

[databases]
...
5
  • Change max_standby_streaming_delay to something bigger or -1. Commented Oct 7 at 12:02
  • @LaurenzAlbe but why if my request finishes in 1 second? Please explain! Commented Oct 7 at 12:10
  • 3
    max_standby_streaming_delay сancels all conflicting queries when the replica lags beyond this time - regardless of how long the query has been running. The first query started at 0 seconds and caused a replication conflict, the second at 0.5 seconds, and the third at 1 second. Here, the first query has completed, but replication cannot move forward because the second query uses the same snapshot (the most recent known) and now conflicts with replication. And so on, until max_standby_streaming_delay reaches its maximum value and all queries that conflict with replication are canceled. Commented Oct 7 at 12:34
  • @Melkij "all queries that conflict with replication are canceled" do you mean those requests that were issued during the wait time? Well, it doesn't mean that the conflicting request will take N times that I set in max_standby_streaming_delay? Commented Oct 7 at 12:54
  • @Melkij Thanks, I tried adding 900s for these values ​​and the problem went away. Commented Oct 10 at 8:59

0

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.