Transaction applied partially after failover in postgres

Ask Question

Asked 22 days ago

Modified 18 days ago

Viewed 27 times

We noticed a data inconsistency in a 3-node asynchronous replication Postgres setup. A single JDBC transaction inserted/updated rows in two tables but we are noticing that one table did not have a few rows inserted. A few hours after the transaction the primary node went down and standby node became master.

Environment

PostgreSQL 13.x (3 nodes, asynchronous streaming replication)
JDBC (PostgreSQL driver)
BoneCP connection pool
connection.setAutoCommit(false)
Isolation: REPEATABLE READ
Explicit connection.commit() after multiple executeBatch() calls
queryTimeout = 600s (long transaction)
Application logs show no rollback, no exceptions
Updates made in a transaction have the same value for column version
Entries are written to postgres under a global lock
No entries have been deleted from the table
data‑checksums were not enabled

Code for inserting:

try (Connection connection = dataSource.getConnection()) {
    connection.setAutoCommit(false);
    connection.setTransactionIsolation(Connection.TRANSACTION_REPEATABLE_READ);
    // 1. Insert into main table (table_a)
    // size of updated is around 400000
    if (!updated.isEmpty()) {
        try (PreparedStatement ps = connection.prepareStatement(
                "INSERT INTO table_a (id, version, content) VALUES (?, ?, ?)")) {
            for (Storable s : updated.values()) {
                ps.setString(1, s.getId());
                ps.setLong(2, s.getVersion());
                ps.setBytes(3, s.toBytes());
                ps.addBatch();
            }
            int[] updates = ps.executeBatch();
            int updateCount = IntStream.of(updates).sum();
            if (updateCount != updated.size()) {
                throw new SQLException("Unexpected updateCount=" + updateCount);
            }
        }
    }

    // 2. Update index table (table_b)
    if (!updated.isEmpty()) {
        try (PreparedStatement ps = connection.prepareStatement(
                "UPDATE table_b SET id=?, name=?, version=? WHERE id=?")) {
            for (Storable s : updated.values()) {
                ps.setString(1, s.getId());
                ps.setString(2, s.getName());
                ps.setLong(3, s.getVersion());
                ps.setString(4, s.getId());
                ps.addBatch();
            }
            int[] updates = ps.executeBatch();
            int updateCount = IntStream.of(updates).sum();
            if (updateCount != updated.size()) {
                throw new SQLException("Unexpected updateCount=" + updateCount);
            }
        }
    }

    connection.commit(); // <- Commit succeeds without errors
} catch (SQLException e) {
   connection.rollback();
   ...
}

Observed Behavior

The transaction completed successfully (no rollback or SQL exception).
After a failover (a few hours later):
- The new primary had the updated entries in table_b
- But for a few IDs(which are present in table_b), table_a had no new rows for that version. None of the postgres nodes have these rows in table_a.
For most IDs, both tables were updated correctly.
The tables were first queried after failover, so it’s unclear if the inconsistency existed before failover or appeared later.
PostgreSQL logs showed no errors during the transaction or failover.

Questions:

Could this be caused by disk or memory corruption (given data_checksums were disabled)?
Could it be caused due to an issue with async replication? (Logs for successful syncing of wal are present)
What specific PostgreSQL or system logs should be checked to confirm whether the WAL for this transaction was fully written and applied?

edited Nov 12 at 5:34

asked Nov 8 at 19:19

cb_1796

11 bronze badge

1

I cannot imagine a way in which only part of a transaction can be replicated. So I'd assume that something was wrong with the original transaction.

Laurenz Albe
– Laurenz Albe

2025-11-09 05:56:59 +00:00
Commented Nov 9 at 5:56
are you saying that, out of the three nodes you have, only this one has rows missing?

mustaccio
– mustaccio

2025-11-09 07:27:07 +00:00
Commented Nov 9 at 7:27
@mustaccio no, none of the nodes have those rows. Will edit the post to remove the ambiguity

cb_1796
– cb_1796

2025-11-12 05:32:55 +00:00
Commented Nov 12 at 5:32
@LaurenzAlbe are there any scenarios where a transaction does not get applied completely or maybe some parts of it get corrupted? I forgot to mention in the post but in this particular transaction the size of updated list was 400k. Just the writes to table_a took 8 minutes (this could primarily be due to inefficient way of writing rows in postgres)

cb_1796
– cb_1796

2025-11-12 05:40:30 +00:00
Commented Nov 12 at 5:40
No, that is not possible, exept if there were a surprising software bug in PostgreSQL. Data can of course be corrupted after the transaction, but that corruption wouldn't restore the data from before the transaction. I bet your transaction boundaries were not what you thought they are.

Laurenz Albe
– Laurenz Albe

2025-11-12 08:22:22 +00:00
Commented Nov 12 at 8:22

Add a comment |

0

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

Stack Exchange Network

Transaction applied partially after failover in postgres

0

Hot Network Questions

Transaction applied partially after failover in postgres

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Related

Hot Network Questions