Skip to content

Keep prepared transactions rollbackable after commit write failure#14778

Closed
xingbowang wants to merge 2 commits into
facebook:mainfrom
xingbowang:2026_05_21_oncall_T272425212
Closed

Keep prepared transactions rollbackable after commit write failure#14778
xingbowang wants to merge 2 commits into
facebook:mainfrom
xingbowang:2026_05_21_oncall_T272425212

Conversation

@xingbowang

@xingbowang xingbowang commented May 23, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Restore prepared transactions to PREPARED state when writing the commit marker fails, so callers can still roll them back.
  • Preserve PREPARED state when rollback of a prepared transaction hits a retryable write error, allowing rollback to be retried after DB::Resume().
  • Update db_stress to clean up prepared transactions after failed commits and report detailed rollback cleanup failure diagnostics.
  • Add a WritePrepared regression test covering retryable commit write failure, retryable rollback write failure, successful rollback retry, and DB reopen.

Test Plan

CI

@meta-cla meta-cla Bot added the CLA Signed label May 23, 2026
@github-actions

github-actions Bot commented May 23, 2026

Copy link
Copy Markdown

✅ clang-tidy: No findings on changed lines

Completed in 413.9s.

Stress tests with WritePrepared transactions and retryable write fault injection can hit an injected write error while writing the commit marker for a prepared transaction. In that case the transaction has already been added to PreparedHeap, but PessimisticTransaction::Commit() left the transaction state as AWAITING_COMMIT after CommitInternal() returned the error. A later Rollback() would then reject the transaction as not rollbackable, leaving the prepared entry unresolved until WritePreparedTxnDB teardown asserted that PreparedHeap was empty.

Restore the transaction state to PREPARED when a prepared commit write fails so callers can roll the transaction back. Do the same when rollback of a prepared transaction fails, since rollback itself can also hit retryable injected write errors and must remain retryable after DB::Resume() recovers the background error.

Update db_stress to clean up prepared transactions whose commit returned non-OK. It first attempts Rollback(), then for retryable injected rollback errors it repeatedly calls DB::Resume() and retries rollback with a fixed 10ms sleep, bounded to 100 attempts. If cleanup still fails, log the transaction name, commit status, rollback status, retry count, Resume() busy count, and last Resume() status so the stress failure points at the unresolved prepared transaction instead of only failing later during close or reopen.

Add a WritePrepared regression test that injects a retryable write error on Commit(), verifies rollback also remains retryable across an injected rollback write error, then retries rollback and reopens the DB. This covers the PreparedHeap teardown failure mode from T272425212.

Test Plan:

make -j128 db_stress write_prepared_transaction_test

timeout 60s ./write_prepared_transaction_test --gtest_filter='*RollbackPreparedAfterCommitWriteFailure*' --gtest_repeat=5

python3 "/home/xbw/workspace/gtest-parallel/gtest_parallel.py" -w 32 --timeout=60 ./write_prepared_transaction_test

make check-sources

git diff --check -- db_stress_tool/db_stress_test_base.cc utilities/transactions/pessimistic_transaction.cc utilities/transactions/write_prepared_transaction_test.cc
@xingbowang xingbowang force-pushed the 2026_05_21_oncall_T272425212 branch from 9efb7bd to ac31737 Compare May 23, 2026 22:15
@meta-codesync

meta-codesync Bot commented May 23, 2026

Copy link
Copy Markdown

@xingbowang has imported this pull request. If you are a Meta employee, you can view this in D106202437.

@github-actions

github-actions Bot commented May 23, 2026

Copy link
Copy Markdown

Codex Code Review - OBSOLETE

Superseded by a newer AI review. Expand to see the original review.

🟡 Codex Code Review

Auto-triggered after CI passed — reviewing commit ac31737


Codex review failed before producing findings.

WARNING: proceeding, even though we could not update PATH: Refusing to create helper binaries under temporary dir "/tmp" (codex_home: AbsolutePathBuf("/tmp/codex-home"))
error: the argument '--base <BRANCH>' cannot be used with '[PROMPT]'

Usage: codex exec review --commit <SHA> --base <BRANCH> --title <TITLE> --model <MODEL> --config <key=value> --dangerously-bypass-approvals-and-sandbox --output-last-message <FILE> [PROMPT]

For more information, try '--help'.

ℹ️ About this response

Generated by Codex CLI.
Review methodology: claude_md/code_review.md

Limitations:

  • Codex may miss context from files not in the diff
  • Large PRs may be truncated
  • Always apply human judgment to AI suggestions

Commands:

  • /codex-review [context] — Request a code review
  • /codex-query <question> — Ask about the PR or codebase
@github-actions

github-actions Bot commented May 23, 2026

Copy link
Copy Markdown

Claude Code Review - OBSOLETE

Superseded by a newer AI review. Expand to see the original review.

✅ Claude Code Review

Auto-triggered after CI passed — reviewing commit ac31737


Summary

Well-designed fix that correctly restores transaction state to PREPARED after failed commit/rollback, enabling retry. The core state machine changes are sound. One pre-existing issue in WriteUnprepared becomes more relevant with this change.

High-severity findings (0):

No high-severity findings.

Full review (click to expand)

Findings

🟡 MEDIUM

M1. Pre-existing unprep_seqs_.clear() in WriteUnprepared rollback becomes actionable — write_unprepared_txn.cc:822
  • Issue: WriteUnpreparedTxn::RollbackInternal() clears unprep_seqs_ at line 822 regardless of whether the second WriteImpl (two_write_queues path) succeeded. Before this PR, the state was stuck at AWAITING_ROLLBACK so retry was impossible. With this PR restoring state to PREPARED, retry becomes possible but unprep_seqs_ is empty, making the retry attempt incorrect.
  • Root cause: Pre-existing unconditional cleanup in the two-write-queue error path of WriteUnprepared rollback.
  • Suggested fix: Move unprep_seqs_.clear() and related cleanup inside the if (s.ok()) block at line 816, or file a follow-up issue. This is outside the scope of this PR but worth noting.
M2. Missing GetState() assertions in test — write_prepared_transaction_test.cc:2363
  • Issue: The new test RollbackPreparedAfterCommitWriteFailure does not verify transaction state after each failure. Adding ASSERT_EQ(Transaction::PREPARED, txn->GetState()) after commit failure and after rollback failure would directly validate the core invariant this PR establishes.
  • Root cause: Test validates the operations succeed/fail but doesn't assert the state machine transitions.
  • Suggested fix: Add state assertions:
    ASSERT_NOK(commit_s);
    ASSERT_EQ(Transaction::PREPARED, txn->GetState());  // NEW
    ...
    ASSERT_NOK(rollback_s);
    ASSERT_EQ(Transaction::PREPARED, txn->GetState());  // NEW
M3. No data visibility verification after rollback — write_prepared_transaction_test.cc:2363
  • Issue: The test does not verify that "key" is not readable after successful rollback and before ReOpen. A read check would confirm the rollback actually took effect at the data level, not just the state level.
  • Root cause: Test focuses on operational success/failure but not data correctness.
  • Suggested fix: Add after successful rollback:
    ASSERT_OK(txn->Rollback());
    {
      ReadOptions ro;
      PinnableSlice val;
      ASSERT_TRUE(db->Get(ro, db->DefaultColumnFamily(), "key", &val).IsNotFound());
    }

🟢 LOW / NIT

L1. API consistency: CommitBatch() does not restore state on failure — pessimistic_transaction.cc:573-578
  • Issue: CommitBatch() sets txn_state_.store(AWAITING_COMMIT) at line 574 but does not restore to a usable state on failure. This is inconsistent with the new Commit() behavior. However, CommitBatch() is for non-prepared transactions (STARTED state) where rollback semantics differ.
  • Suggested fix: Consider aligning in a follow-up PR, or document the difference.
L2. commit_without_prepare path doesn't restore state — pessimistic_transaction.cc:712-731
  • Issue: The commit_without_prepare path calls Clear() before checking status (line 727), then only sets COMMITTED if ok (line 729). On failure, state remains AWAITING_COMMIT and transaction is cleared. This is pre-existing and different from the prepared path, but inconsistent.
  • Suggested fix: Consider aligning in a follow-up PR.
L3. Test only covers WritePrepared, not WriteCommitted — write_prepared_transaction_test.cc
  • Issue: The new test is parameterized under WritePreparedTransactionTest only. A similar test for WriteCommitted transactions would improve coverage of the state machine fix in PessimisticTransaction::Commit() and Rollback().
  • Suggested fix: Add a corresponding test in TransactionTest (which covers WriteCommitted).
L4. Retry loop uses clock_->SleepForMicroseconds()db_stress_test_base.cc:998
  • Issue: CLAUDE.md guidelines discourage sleep in tests due to flakiness. However, this is in db_stress (not unit tests) and the sleep is for error recovery (waiting for Resume to take effect), which is acceptable.
  • Suggested fix: No change needed; the context is appropriate.

Cross-Component Analysis

Context Does code execute? Assumptions hold? Action needed?
WritePreparedTxnDB YES YES — CommitInternal single-write path; no partial commit concern Safe
WriteUnpreparedTxnDB YES MOSTLY — CommitInternal single-write; RollbackInternal two-write path has pre-existing unprep_seqs_ issue (M1) Follow-up for M1
WriteCommittedTxnDB YES YES — CommitInternal/RollbackInternal are simple single writes Safe
ReadOnly DB NO N/A N/A
Concurrent observers (GetState) Possible YES — AWAITING_COMMIT/AWAITING_ROLLBACK were already transient; no code branches on them Safe
Transaction expiration NO — expiration_time_ cleared during Prepare N/A N/A

Assumption stress-test results:

The claim "CommitInternal two-write path could partially commit" was investigated and found to be unreachable for valid inputs. In both WritePrepared and WriteUnprepared CommitInternal, the two-write path requires includes_data=true, which requires a non-empty commit-time batch without for_recovery=true. But if the batch is non-empty and for_recovery is false, the code returns InvalidArgument before reaching the write path. Therefore do_one_write is always true for valid CommitInternal calls, and partial commit is not a concern.

For RollbackInternal, the two-write path IS reachable (controlled only by two_write_queues option). However, if the first write fails, RollbackInternal returns early before any cleanup, and retry is safe. The M1 finding applies only when the first write succeeds and the second fails.

Positive Observations

  • The core fix is minimal and well-targeted: two txn_state_.store(PREPARED) calls in error paths.
  • The db_stress retry logic includes good diagnostics (txn name, status strings, retry counts).
  • The new test covers the full cycle: commit failure -> resume -> rollback failure -> resume -> rollback success -> reopen.
  • The fix correctly does NOT call Clear() on commit failure, preserving locks and write batch for subsequent rollback.
  • The serialization reviewer confirmed WAL consistency is maintained in all failure scenarios.

ℹ️ About this response

Generated by Claude Code.
Review methodology: claude_md/code_review.md

Limitations:

  • Claude may miss context from files not in the diff
  • Large PRs may be truncated
  • Always apply human judgment to AI suggestions

Commands:

  • /claude-review [context] — Request a code review
  • /claude-query <question> — Ask about the PR or codebase
Clarify the public Transaction API around prepared TransactionDB cleanup. A non-OK Commit() after Prepare() can still leave a 2PC transaction needing resolution, so callers should Rollback() or otherwise resolve it before destroying the transaction or closing the DB.

Replace the old Rollback() FIXME with concrete guidance: transaction destruction releases in-memory resources but does not resolve prepared state, and Rollback() can itself return retryable I/O errors when it has to write rollback state. In that case callers should recover or resume the DB and retry Rollback() before close.

Add an unreleased bug-fix note for the WritePrepared cleanup fix.

Test Plan:

make check-sources

git diff --check -- include/rocksdb/utilities/transaction.h unreleased_history/bug_fixes/prepared_txn_rollback_after_write_failure.md
@meta-codesync

meta-codesync Bot commented May 27, 2026

Copy link
Copy Markdown

@xingbowang has imported this pull request. If you are a Meta employee, you can view this in D106202437.

@github-actions

Copy link
Copy Markdown

🟡 Codex Code Review

Auto-triggered after CI passed — reviewing commit 53176de


Codex review failed before producing findings.

WARNING: proceeding, even though we could not update PATH: Refusing to create helper binaries under temporary dir "/tmp" (codex_home: AbsolutePathBuf("/tmp/codex-home"))
error: the argument '--base <BRANCH>' cannot be used with '[PROMPT]'

Usage: codex exec review --commit <SHA> --base <BRANCH> --title <TITLE> --model <MODEL> --config <key=value> --dangerously-bypass-approvals-and-sandbox --output-last-message <FILE> [PROMPT]

For more information, try '--help'.

ℹ️ About this response

Generated by Codex CLI.
Review methodology: claude_md/code_review.md

Limitations:

  • Codex may miss context from files not in the diff
  • Large PRs may be truncated
  • Always apply human judgment to AI suggestions

Commands:

  • /codex-review [context] — Request a code review
  • /codex-query <question> — Ask about the PR or codebase
@github-actions

Copy link
Copy Markdown

✅ Claude Code Review

Auto-triggered after CI passed — reviewing commit 53176de


Summary

Well-targeted fix for a real bug: prepared transactions stuck in AWAITING_COMMIT/AWAITING_ROLLBACK become invisible to GetAllPreparedTransactions() and unresolvable. Restoring PREPARED state on failure is the correct minimal fix. The implementation is safe -- locks are retained, PreparedHeap remains consistent, and Rollback() uses a fresh batch unaffected by prior commit attempts.

High-severity findings (1):

  • [pessimistic_transaction.cc:740] Retrying Commit() after failure corrupts commit_time_batch_ (double MarkCommit/Append). The API docs say to call Rollback(), but the state machine does not enforce this -- a defensive guard is recommended.
Full review (click to expand)

Findings

🔴 HIGH

H1. Commit retry corrupts commit_time_batch_ -- pessimistic_transaction.cc:740
  • Issue: After CommitInternal() fails and state is restored to PREPARED, calling Commit() again is permitted by the state machine. However, CommitInternal() (both WritePreparedTxn at write_prepared_txn.cc:202 and WriteCommittedTxn at pessimistic_transaction.cc:859) calls MarkCommit() on GetCommitTimeWriteBatch() which permanently appends a commit marker to commit_time_batch_. WriteCommittedTxn::CommitInternal() also calls Append(working_batch, wb) at line 942, appending the full write batch. A second Commit() call would double these operations, producing a malformed batch.
  • Root cause: commit_time_batch_ is a mutable member modified in-place by CommitInternal(). The state restoration to PREPARED re-enables the commit path without resetting this batch.
  • Mitigating factors: The PR's API documentation explicitly directs users to call Rollback() after failed commit, not retry. Rollback() uses its own fresh WriteBatch, so it is unaffected. The new test only tests Rollback after failed commit.
  • Suggested fix: Either (a) add a boolean flag (e.g., commit_attempted_) that prevents re-entry to Commit() after failure, returning Status::InvalidArgument("Commit already attempted; call Rollback()"), or (b) clear/reset commit_time_batch_ before the MarkCommit call in CommitInternal. Option (a) is simpler and makes the API contract enforceable.

🟡 MEDIUM

M1. db_stress uses clock_->SleepForMicroseconds in retry loop -- db_stress_test_base.cc:996
  • Issue: CLAUDE.md guidance says "Don't use sleep to wait for certain events to happen. This will cause test to be flaky." The retry loop sleeps 10ms between Resume+Rollback attempts.
  • Mitigating factors: This is not waiting for an asynchronous event -- it's a deliberate backoff between retries of a synchronous operation. The loop is bounded by kMaxRollbackAfterRecoveryRetries = 100, so it terminates in at most 1 second. This pattern is reasonable for error recovery.
  • Suggested fix: Consider adding a comment explaining why sleep is acceptable here (backoff, not event-waiting). Alternatively, consider using SyncPoint for test determinism if fault injection timing matters.
M2. No test coverage for WriteCommittedTxn path -- write_prepared_transaction_test.cc
  • Issue: The regression test RollbackPreparedAfterCommitWriteFailure only tests WritePreparedTransactionTest. The state restoration in PessimisticTransaction::Commit() and Rollback() also affects WriteCommittedTxn, which has a different CommitInternal() and RollbackInternal(). No test covers the WriteCommitted path.
  • Suggested fix: Add a parallel test in the WriteCommitted transaction test suite, or parameterize the test to cover both policies.

🟢 LOW / NIT

L1. Missing last_resume_status initialization for non-retry path -- db_stress_test_base.cc:972
  • Issue: last_resume_status is initialized to empty string and only set inside the retry loop. If the initial txn.Rollback() fails but IsErrorInjectedAndRetryable returns false (or db_ is null), the fprintf at line 997 prints an empty last_resume_status. This is technically correct but the diagnostic message could be more informative.
  • Suggested fix: Initialize last_resume_status to "N/A" or "not attempted".
L2. Comment style nit -- pessimistic_transaction.cc:1007
  • Issue: The new comment says "Rollback writes can fail under retryable IO errors." This is slightly imprecise -- rollback writes can fail under any write error, not just retryable IO errors. The state restoration is correct regardless of error type.
  • Suggested fix: "Rollback writes can fail (e.g., retryable IO errors). Preserve the state so callers can retry after error recovery."
L3. The old FIXME in transaction.h is replaced but the answer is only partially addressed
  • Issue: The original FIXME asked "what happens if this isn't called before destruction?" The new documentation says "Destroying a Transaction object releases its in-memory resources, but it is not a substitute for resolving a prepared transaction." This is helpful but doesn't fully answer what happens to a prepared transaction's PreparedHeap entry on destruction without Rollback. The entry leaks.
  • Suggested fix: Consider adding: "Destroying a prepared transaction without Commit() or Rollback() will leave its prepared state in the PreparedHeap, which is cleaned up on DB recovery."

Cross-Component Analysis

Context Safe? Notes
WritePreparedTxnDB YES Primary target. PreparedHeap entries retained correctly. CommitCache not modified on failure.
WriteUnpreparedTxnDB YES Same base class Rollback(). WriteUnpreparedTxn::RollbackInternal has same two-write pattern but state fix applies equally.
WriteCommittedTxnDB YES Simpler CommitInternal/RollbackInternal. State restoration is safe. commit_time_batch corruption on retry is a concern (H1).
Two write queues YES First-write-success + second-write-failure is handled: cleanup code at write_prepared_txn.cc:281-284 removes the orphaned prepared entry.
Concurrent access YES txn_state_ is atomic. No CAS needed for PREPARED path (no expiration/lock-stealing race).
DB Recovery/Reopen YES WAL-based recovery reconstructs prepared transactions. In-memory state restoration doesn't affect durability.

Positive Observations

  • Correct root cause fix: The bug was that AWAITING_COMMIT/AWAITING_ROLLBACK states made transactions invisible to GetAllPreparedTransactions(). Restoring PREPARED is the minimal correct fix.
  • Good API documentation: The additions to transaction.h clearly guide users on proper 2PC error handling.
  • Well-structured test: The regression test covers commit failure, rollback failure, Resume, retry, and DB reopen. The test is parameterized over two_write_queues via WritePreparedTransactionTest.
  • Defensive db_stress code: The bounded retry loop with diagnostic output is practical and will help debug future stress test failures.
  • Lock safety verified: Clear() (which releases locks) is only called on the success path, never before the error return. Locks are fully retained after failed commit/rollback.

ℹ️ About this response

Generated by Claude Code.
Review methodology: claude_md/code_review.md

Limitations:

  • Claude may miss context from files not in the diff
  • Large PRs may be truncated
  • Always apply human judgment to AI suggestions

Commands:

  • /claude-review [context] — Request a code review
  • /claude-query <question> — Ask about the PR or codebase
@meta-codesync

meta-codesync Bot commented May 27, 2026

Copy link
Copy Markdown

@xingbowang merged this pull request in 364eb88.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

2 participants