Skip to content

Notify listeners before DB shutdown begins#14769

Closed
xingbowang wants to merge 3 commits into
facebook:mainfrom
xingbowang:2026_05_21_oncall_T272334111
Closed

Notify listeners before DB shutdown begins#14769
xingbowang wants to merge 3 commits into
facebook:mainfrom
xingbowang:2026_05_21_oncall_T272334111

Conversation

@xingbowang

@xingbowang xingbowang commented May 21, 2026

Copy link
Copy Markdown
Contributor

Summary

Notify listeners before DB shutdown begins

db_stress listener bookkeeping correlates compaction callbacks with file deletion callbacks. DBImpl intentionally skips some compaction listener callbacks once shutdown starts, but file deletion callbacks can still be delivered. That mismatch is normally tolerable for production listeners, but db_stress uses listener-local tracking to detect callback consistency and can report a false positive when shutdown interrupts a compaction callback sequence.

The failed-open case is the important gap. DB::Open() can create a DBImpl, recover or flush files, schedule background compaction, and then fail during late open work such as persisting OPTIONS or waiting for open-time compaction under fault injection. At that point DB::Open() tears down the internal DBImpl before returning an error to StressTest::Open(). If OnCompactionBegin already recorded an input file or job in DbStressListener, DBImpl shutdown can suppress the later OnCompactionPreCommit/OnCompactionCompleted callbacks that would normally clear that state. Since control has not returned to the stress harness yet, StressTest::CleanUp()/Reopen() cannot notify the listener in time. A later shutdown-time callback, or the next open retry reusing the same listener, can then observe stale tracking and abort even though RocksDB did not compact the same SST concurrently.

Add EventListener::OnDBShutdownBegin and fire it once from DBImpl::CancelAllBackgroundWork() before publishing shutting_down_. The callback also covers cleanup of a failed DB::Open() attempt, where the DB pointer refers to the internal DBImpl that was never returned to the caller. Track shutdown_notification_sent_ separately from shutting_down_ because listeners are invoked with mutex_ released, and a concurrent or reentrant cancellation must not deliver the callback twice.

Update DbStressListener to consume the DBImpl-driven shutdown notification instead of relying on StressTest::CleanUp()/Reopen() to manually notify it. This lets db_stress mark itself as shutting down before DBImpl starts skipping shutdown-sensitive compaction notifications, including during failed-open cleanup.

Also keep listener state scoped to one db_stress open attempt. Initialize listeners at the top of each non-transactional open retry before enabling open fault injection, so retry attempts get fresh listener state without widening open fault injection to listener construction. Factor the open fault setup into a small helper to keep the retry loop readable. The transaction open path still initializes listeners once because it does not use this open-fault retry loop.

Also fix multi-ops transaction listener checks during shutdown

A TSAN race was reported where MultiOpsTxnsStressListener::OnCompactionCompleted called VerifyPkSkFast on a background compaction thread while the main thread was destroying the transaction DB wrapper in StressTest::CleanUp(). The reported read was a virtual call through the DB object and the write was the WritePreparedTxnDB/WriteUnpreparedTxnDB destructor updating the vptr.

The vulnerable ordering is that DBImpl can already be inside NotifyOnCompactionCompleted before shutdown is requested. It unlocks db mutex while iterating listeners; another listener such as DbStressListener can spend time in its callback, giving the main thread time to enter CleanUp()/Close(). The DB object is still shutting down, but MultiOpsTxnsStressListener may be invoked later in the same callback iteration and call VerifyPkSkFast through stress_test_->db_aptr_, racing with DB wrapper destruction.

With DBImpl-owned EventListener::OnDBShutdownBegin callback, we have MultiOpsTxnsStressListener consume that callback directly. Once DBImpl begins shutdown, the listener skips both flush-completed and compaction-completed verification callbacks, avoiding DB access during teardown. This also covers failed-open cleanup without name-based downcasts in StressTest.

Test Plan

CI

db_stress listener bookkeeping correlates compaction callbacks with file deletion callbacks. DBImpl intentionally skips some compaction listener callbacks once shutdown starts, but file deletion callbacks can still be delivered. That mismatch is normally tolerable for production listeners, but db_stress uses listener-local tracking to detect callback consistency and can report a false positive when shutdown interrupts a compaction callback sequence.

The failed-open case is the important gap. DB::Open() can create a DBImpl, recover or flush files, schedule background compaction, and then fail during late open work such as persisting OPTIONS or waiting for open-time compaction under fault injection. At that point DB::Open() tears down the internal DBImpl before returning an error to StressTest::Open(). If OnCompactionBegin already recorded an input file or job in DbStressListener, DBImpl shutdown can suppress the later OnCompactionPreCommit/OnCompactionCompleted callbacks that would normally clear that state. Since control has not returned to the stress harness yet, StressTest::CleanUp()/Reopen() cannot notify the listener in time. A later shutdown-time callback, or the next open retry reusing the same listener, can then observe stale tracking and abort even though RocksDB did not compact the same SST concurrently.

Add EventListener::OnDBShutdownBegin and fire it once from DBImpl::CancelAllBackgroundWork() before publishing shutting_down_. The callback also covers cleanup of a failed DB::Open() attempt, where the DB pointer refers to the internal DBImpl that was never returned to the caller. Track shutdown_notification_sent_ separately from shutting_down_ because listeners are invoked with mutex_ released, and a concurrent or reentrant cancellation must not deliver the callback twice.

Update DbStressListener to consume the DBImpl-driven shutdown notification instead of relying on StressTest::CleanUp()/Reopen() to manually notify it. This lets db_stress mark itself as shutting down before DBImpl starts skipping shutdown-sensitive compaction notifications, including during failed-open cleanup.

Also keep listener state scoped to one db_stress open attempt. Initialize listeners at the top of each non-transactional open retry before enabling open fault injection, so retry attempts get fresh listener state without widening open fault injection to listener construction. Factor the open fault setup into a small helper to keep the retry loop readable. The transaction open path still initializes listeners once because it does not use this open-fault retry loop.

Tests:

- make db_stress -j192

- make check-sources

- make listener_test -j192

- ./listener_test --gtest_filter='EventListenerTest.OnDBShutdownBegin*'

- git diff --check
@meta-cla meta-cla Bot added the CLA Signed label May 21, 2026
@github-actions

github-actions Bot commented May 21, 2026

Copy link
Copy Markdown

✅ clang-tidy: No findings on changed lines

Completed in 271.8s.

@github-actions

github-actions Bot commented May 21, 2026

Copy link
Copy Markdown

Codex Code Review - OBSOLETE

Superseded by a newer AI review. Expand to see the original review.

🟡 Codex Code Review

Auto-triggered after CI passed — reviewing commit 3b97bf4


Codex review failed before producing findings.

WARNING: proceeding, even though we could not update PATH: Refusing to create helper binaries under temporary dir "/tmp" (codex_home: AbsolutePathBuf("/tmp/codex-home"))
error: the argument '--base <BRANCH>' cannot be used with '[PROMPT]'

Usage: codex exec review --commit <SHA> --base <BRANCH> --title <TITLE> --model <MODEL> --config <key=value> --dangerously-bypass-approvals-and-sandbox --output-last-message <FILE> [PROMPT]

For more information, try '--help'.

ℹ️ About this response

Generated by Codex CLI.
Review methodology: claude_md/code_review.md

Limitations:

  • Codex may miss context from files not in the diff
  • Large PRs may be truncated
  • Always apply human judgment to AI suggestions

Commands:

  • /codex-review [context] — Request a code review
  • /codex-query <question> — Ask about the PR or codebase
@github-actions

github-actions Bot commented May 21, 2026

Copy link
Copy Markdown

Claude Code Review - OBSOLETE

Superseded by a newer AI review. Expand to see the original review.

✅ Claude Code Review

Auto-triggered after CI passed — reviewing commit 3b97bf4


Summary

Well-designed PR that adds EventListener::OnDBShutdownBegin, a callback that fires once before DBImpl publishes its shutdown state. The implementation follows existing RocksDB listener notification patterns correctly. The db_stress refactoring is a clean improvement.

High-severity findings (0):

No high-severity findings.

Full review (click to expand)

Findings

🔴 HIGH

None.

🟡 MEDIUM

M1. Concurrent CancelAllBackgroundWork can publish shutting_down_ while listener callbacks are still executing — db/db_impl/db_impl.cc:556
  • Issue: When Thread A releases mutex_ inside NotifyOnDBShutdownBegin() to call listeners, Thread B can enter CancelAllBackgroundWork(), acquire mutex_, see shutdown_notification_sent_=true (skip notification), then proceed to shutting_down_.store(true) — all before Thread A's listener callbacks finish.
  • Root cause: The fire-once guard correctly prevents double notification, but does not prevent the shutdown state from being published while callbacks are in-flight.
  • Impact: Listeners receive OnDBShutdownBegin with a promise that shutdown hasn't begun yet, but by the time they inspect DB state, shutting_down_ may already be true. Other Notify* callbacks (e.g., NotifyOnFlushBegin) check shutting_down_ and will start returning early.
  • Assessment: In practice this is benign — the listener was called before shutdown, as promised. The documentation already warns "Background work may still be running" and "should not call blocking DB APIs." The concurrent caller is an uncommon edge case. However, the documentation could be more precise.
  • Suggested fix: Add a note to the OnDBShutdownBegin doc: "If multiple threads initiate shutdown concurrently, the callback fires on the first thread, but the DB's shutdown state may be published by another thread before the callback returns."
M2. Listener calling CancelAllBackgroundWork during callback causes redundant shutdown logic — db/db_impl/db_impl.cc:544
  • Issue: If a listener calls CancelAllBackgroundWork(db, true) from within OnDBShutdownBegin, it will succeed (mutex is not held), skip notification (fire-once guard), but execute the rest of the shutdown sequence including shutting_down_.store(true), bg_cv_.SignalAll(), and WaitForBackgroundWork(). When control returns to the original CancelAllBackgroundWork, it stores shutting_down_ again (harmless) and signals bg_cv_ again (harmless).
  • Root cause: No reentrancy guard on the broader CancelAllBackgroundWork logic, only on the notification.
  • Impact: Functionally harmless (idempotent operations), but violates the user expectation documented as "should not call blocking DB APIs." WaitForBackgroundWork() is a blocking call that could delay callback return.
  • Suggested fix: Documentation is sufficient — the warning against blocking DB APIs already covers this. No code change needed.
M3. Consider adding test for concurrent CancelAllBackgroundWorkdb/listener_test.cc
  • Issue: The two new tests cover (a) cancel-then-close fires once, and (b) failed-open fires notification. Neither tests concurrent calls to CancelAllBackgroundWork from multiple threads.
  • Suggested fix: Add a test using sync points that has two threads call CancelAllBackgroundWork and verifies ShutdownCount() == 1.

🟢 LOW / NIT

L1. API documentation could mention the failed-open DB pointer has limited functionality — include/rocksdb/listener.h:975
  • Issue: The doc says "the DB pointer refers to the DB instance that failed to open and was never returned to the application" but doesn't explicitly state that this DB may have limited functionality.
  • Suggested fix: Consider adding: "The DB instance may be partially initialized and should not be used for database operations."
L2. Consider whether OnDBShutdownComplete should be a future addition — include/rocksdb/listener.h
  • Assessment: Other lifecycle events have begin/complete pairs. Shutdown is terminal, so this may not be needed, but worth tracking.
L3. OpenFaultInjectionConfig is a good refactoring — db_stress_tool/db_stress_test_base.cc:65
L4. MaybeEnableOpenFaultInjection returns bool even when no fault was enabled due to missing CURRENT — db_stress_tool/db_stress_test_base.cc:90
  • Assessment: Correct behavior, adequately documented in comments.

Cross-Component Analysis

Context OnDBShutdownBegin fires? Notes
Normal Close/destructor Yes Via CloseHelper → CancelAllBackgroundWork(false)
WritePreparedTxnDB Yes, once ~WritePreparedTxnDB calls first; base sees already_shutting_down
DBWithTTL Yes Via ~DBTtlImpl → CancelAllBackgroundWork(true)
ReadOnly/Secondary DB Yes Via base ~DBImpl
WaitForCompact(close_db) Yes Via Close() → normal path
Failed DB::Open() Yes impl unique_ptr destruction at db_impl_open.cc:2956

Exception safety: Not a concern — RocksDB builds with -fno-rtti in release; all existing Notify* methods use the same unlock-call-lock pattern without try/catch.

Positive Observations

  • Eliminates the manual NotifyShuttingDown() hack in db_stress with a proper lifecycle callback
  • shutdown_notification_sent_ correctly handles all concurrent/reentrant scenarios
  • Moving InitializeListenersForOpen inside the retry loop is the right fix for listener lifecycle management
  • Tests cover the two primary paths (normal shutdown, failed open)

ℹ️ About this response

Generated by Claude Code.
Review methodology: claude_md/code_review.md

Limitations:

  • Claude may miss context from files not in the diff
  • Large PRs may be truncated
  • Always apply human judgment to AI suggestions

Commands:

  • /claude-review [context] — Request a code review
  • /claude-query <question> — Ask about the PR or codebase
@meta-codesync

meta-codesync Bot commented May 21, 2026

Copy link
Copy Markdown

@xingbowang has imported this pull request. If you are a Meta employee, you can view this in D106011452.

1 similar comment
@meta-codesync

meta-codesync Bot commented May 21, 2026

Copy link
Copy Markdown

@xingbowang has imported this pull request. If you are a Meta employee, you can view this in D106011452.

T272131642 reported a TSAN race where MultiOpsTxnsStressListener::OnCompactionCompleted called VerifyPkSkFast on a background compaction thread while the main thread was destroying the transaction DB wrapper in StressTest::CleanUp(). The reported read was a virtual call through the DB object and the write was the WritePreparedTxnDB/WriteUnpreparedTxnDB destructor updating the vptr.

The vulnerable ordering is that DBImpl can already be inside NotifyOnCompactionCompleted before shutdown is requested. It unlocks db mutex while iterating listeners; another listener such as DbStressListener can spend time in its callback, giving the main thread time to enter CleanUp()/Close(). The DB object is still shutting down, but MultiOpsTxnsStressListener may be invoked later in the same callback iteration and call VerifyPkSkFast through stress_test_->db_aptr_, racing with DB wrapper destruction.

D105719244 proposed adding a harness-side NotifyShuttingDown path for MultiOpsTxnsStressListener. With the previous commit's DBImpl-owned EventListener::OnDBShutdownBegin callback, the clearer fix is to have MultiOpsTxnsStressListener consume that callback directly. Once DBImpl begins shutdown, the listener skips both flush-completed and compaction-completed verification callbacks, avoiding DB access during teardown. This also covers failed-open cleanup without name-based downcasts in StressTest.

Tests:

- make db_stress -j192

- make check-sources

- git diff --check
@github-actions

github-actions Bot commented May 21, 2026

Copy link
Copy Markdown

Codex Code Review - OBSOLETE

Superseded by a newer AI review. Expand to see the original review.

🟡 Codex Code Review

Auto-triggered after CI passed — reviewing commit d81eb6c


Codex review failed before producing findings.

WARNING: proceeding, even though we could not update PATH: Refusing to create helper binaries under temporary dir "/tmp" (codex_home: AbsolutePathBuf("/tmp/codex-home"))
error: the argument '--base <BRANCH>' cannot be used with '[PROMPT]'

Usage: codex exec review --commit <SHA> --base <BRANCH> --title <TITLE> --model <MODEL> --config <key=value> --dangerously-bypass-approvals-and-sandbox --output-last-message <FILE> [PROMPT]

For more information, try '--help'.

ℹ️ About this response

Generated by Codex CLI.
Review methodology: claude_md/code_review.md

Limitations:

  • Codex may miss context from files not in the diff
  • Large PRs may be truncated
  • Always apply human judgment to AI suggestions

Commands:

  • /codex-review [context] — Request a code review
  • /codex-query <question> — Ask about the PR or codebase
@github-actions

github-actions Bot commented May 21, 2026

Copy link
Copy Markdown

Claude Code Review - OBSOLETE

Superseded by a newer AI review. Expand to see the original review.

✅ Claude Code Review

Auto-triggered after CI passed — reviewing commit d81eb6c


Summary

This PR adds a well-designed EventListener::OnDBShutdownBegin(DB*) callback that formalizes an existing manual shutdown notification pattern in db_stress. The implementation correctly follows RocksDB's established Notify* patterns, properly handles concurrency with shutdown_notification_sent_ under mutex, and fires before shutting_down_ is published. The db_stress refactoring (open fault injection helpers, per-retry listener initialization) is clean and correct. The MultiOpsTxnsStressListener TSAN fix properly uses acquire-release atomics to prevent verification callbacks from racing with DB destruction.

High-severity findings (0):

No high-severity findings.

Full review (click to expand)

Findings

🔴 HIGH

No high-severity findings.

🟡 MEDIUM

M1. Consider adding ShutdownInfo struct for API extensibility -- include/rocksdb/listener.h:975
  • Issue: Every other EventListener method that takes a DB* parameter also takes a second info struct (FlushJobInfo, CompactionJobInfo, BackgroundJobPressure, etc.). OnDBShutdownBegin(DB*) is the only exception. Adding a ShutdownInfo struct now (even if empty) would allow future extensions (shutdown reason, timing metadata) without breaking the API.
  • Root cause: The current use case (db_stress) doesn't need additional info, but the public API serves all users.
  • Suggested fix:
    struct ShutdownBeginInfo {
      // Reserved for future use (shutdown reason, etc.)
    };
    virtual void OnDBShutdownBegin(DB* /*db*/, const ShutdownBeginInfo& /*info*/) {}
    Alternatively, if the team prefers simplicity now, document why the struct was omitted and that a future version may add one.
M2. Missing OnDBShutdownCompleted counterpart -- include/rocksdb/listener.h:975
  • Issue: Other lifecycle callbacks come in Begin/Completed pairs (OnFlushBegin/OnFlushCompleted, OnCompactionBegin/OnCompactionCompleted). OnDBShutdownBegin has no OnDBShutdownCompleted. While there may be good reasons (shutdown completion is observable via DB destruction, and firing callbacks from destructors is risky), this asymmetry should be documented.
  • Suggested fix: Add a comment to OnDBShutdownBegin explaining why there is no Completed counterpart, e.g., "No OnDBShutdownCompleted is provided because shutdown completion coincides with DB destruction."

🟢 LOW / NIT

L1. Documentation could mention thread safety more explicitly -- include/rocksdb/listener.h:967-974
  • Issue: The doc comment correctly warns about concurrent background work and blocking APIs, but doesn't explicitly state that the callback runs without db_mutex_ held (consistent with all other listener callbacks). Users implementing this callback might benefit from this clarification.
L2. Additional test coverage opportunities -- db/listener_test.cc
  • Issue: The two tests cover the key scenarios (normal shutdown, failed open). Additional tests for concurrent CancelAllBackgroundWork calls, multiple listeners, and interaction with ongoing compaction callbacks would strengthen coverage, but are not blocking given that db_stress provides integration coverage.
L3. DB* pointer is always DBImpl*, not StackableDB* -- include/rocksdb/listener.h:975
  • Issue: The DB* passed to OnDBShutdownBegin is always the internal DBImpl (since NotifyOnDBShutdownBegin passes this), not any StackableDB wrapper. This is consistent with other Notify* methods but could be documented. In the failed-open case, the doc already notes the pointer refers to the internal instance.

Cross-Component Analysis

Context Does notification fire? Correct? Notes
Normal DB close Yes (via CloseHelper -> CancelAllBackgroundWork) Yes Standard path
WritePreparedTxnDB Yes (first CancelAllBackgroundWork call wins) Yes Conditional check in destructor correctly avoids double-firing
TTL DB Yes (explicit CancelAllBackgroundWork before Close) Yes Standard path
BlobDB Yes (via underlying DBImpl destruction) Yes BlobDB destructor relies on DBImpl cleanup
DB::Open failure Yes (via ~DBImpl -> CloseHelper) Yes Key use case; listener gets internal DBImpl pointer
ReadOnly DB Yes (inherits DBImpl shutdown path) Yes No background work, but notification still fires
Concurrent CancelAllBackgroundWork Fires exactly once Yes shutdown_notification_sent_ under mutex prevents double-firing; already_shutting_down provides fast path

Assumption stress-test results:

  • "Fires exactly once": Verified. shutdown_notification_sent_ under mutex_ + already_shutting_down guard are correct. Direct shutting_down_.store() calls only exist in test code, not production paths.
  • "Fires before shutting_down_ published": Verified. Sequential execution in CancelAllBackgroundWork guarantees ordering.
  • "Mutex unlock window is safe": Verified. Same pattern used by all other Notify* methods. A concurrent CancelAllBackgroundWork during the unlock window will see shutdown_notification_sent_=true and skip.
  • "TSAN fix is correct": Verified. Atomic<bool> uses acquire-release semantics. The listener's shutting_down_ flag is set during OnDBShutdownBegin, which fires before CancelAllBackgroundWork returns, which happens before DB wrapper destruction. Background compaction callbacks that are already in-flight will see the flag.

Positive Observations

  • The design correctly identifies CancelAllBackgroundWork() as the universal funnel for all shutdown paths, making it the ideal notification point.
  • Using a separate shutdown_notification_sent_ bool rather than overloading shutting_down_ is the right call -- it cleanly separates "listeners have been notified" from "background work should stop."
  • The failed-open coverage is particularly valuable -- it closes a real gap where db_stress listeners could observe inconsistent state from partially-completed open attempts.
  • The OpenFaultInjectionConfig struct and helper functions improve readability of the retry loop.
  • Moving InitializeListenersForOpen to the top of the retry loop (before fault injection setup) ensures listeners are always fresh per attempt and are never constructed under fault injection, which is correct.

ℹ️ About this response

Generated by Claude Code.
Review methodology: claude_md/code_review.md

Limitations:

  • Claude may miss context from files not in the diff
  • Large PRs may be truncated
  • Always apply human judgment to AI suggestions

Commands:

  • /claude-review [context] — Request a code review
  • /claude-query <question> — Ask about the PR or codebase
@meta-codesync

meta-codesync Bot commented May 21, 2026

Copy link
Copy Markdown

@xingbowang has imported this pull request. If you are a Meta employee, you can view this in D106011452.

Resolve the db_stress open-fault conflict by keeping upstream's per-StressTest FaultInjectionTestFS/env cleanup while preserving fresh listener initialization for each non-transactional open retry.
@github-actions

Copy link
Copy Markdown

🟡 Codex Code Review

Auto-triggered after CI passed — reviewing commit 8cec536


Codex review failed before producing findings.

WARNING: proceeding, even though we could not update PATH: Refusing to create helper binaries under temporary dir "/tmp" (codex_home: AbsolutePathBuf("/tmp/codex-home"))
error: the argument '--base <BRANCH>' cannot be used with '[PROMPT]'

Usage: codex exec review --commit <SHA> --base <BRANCH> --title <TITLE> --model <MODEL> --config <key=value> --dangerously-bypass-approvals-and-sandbox --output-last-message <FILE> [PROMPT]

For more information, try '--help'.

ℹ️ About this response

Generated by Codex CLI.
Review methodology: claude_md/code_review.md

Limitations:

  • Codex may miss context from files not in the diff
  • Large PRs may be truncated
  • Always apply human judgment to AI suggestions

Commands:

  • /codex-review [context] — Request a code review
  • /codex-query <question> — Ask about the PR or codebase
@meta-codesync

meta-codesync Bot commented May 22, 2026

Copy link
Copy Markdown

@xingbowang has imported this pull request. If you are a Meta employee, you can view this in D106011452.

@github-actions

Copy link
Copy Markdown

✅ Claude Code Review

Auto-triggered after CI passed — reviewing commit 8cec536


Summary

Well-designed PR that adds EventListener::OnDBShutdownBegin to fix a legitimate race condition in db_stress listener bookkeeping during shutdown. The core implementation follows established RocksDB notification patterns correctly. The open-fault-injection refactoring is clean and the stress test integration is thorough.

High-severity findings (0):

No high-severity findings.

Full review (click to expand)

Findings

🔴 HIGH

None.

🟡 MEDIUM

M1. Listener notification timing is later than the manual approach it replaces -- db_stress_tool/db_stress_test_base.cc
  • Issue: The old code called NotifyListenerShuttingDown() in StressTest::CleanUp() before db_->Close(). The new approach fires OnDBShutdownBegin from within CancelAllBackgroundWork(), which is called by CloseHelper() inside Close(). This creates a small window between when CleanUp() starts and when the listener is actually notified.
  • Root cause: The notification point moved from "caller decides when to notify" to "DBImpl notifies at the start of its shutdown sequence." The DBImpl approach is architecturally cleaner but slightly later.
  • Assessment: In practice this is unlikely to cause issues because: (a) the notification still fires before shutting_down_ is set, which is the point at which compaction callbacks start getting skipped; (b) during the window between CleanUp() start and CancelAllBackgroundWork(), the DB is still operating normally, so any compaction callbacks that fire would be complete (Begin+PreCommit+Completed). The race this PR fixes (failed-open cleanup) is far more impactful than this theoretical window.
  • Suggested fix: No code change needed. If desired, add a comment in CleanUp() noting that listener shutdown notification is now handled automatically by DBImpl.
M2. Concurrent CancelAllBackgroundWork can set shutting_down_ while listeners are still executing -- db/db_impl/db_impl.cc
  • Issue: Thread A calls NotifyOnDBShutdownBegin(), sets shutdown_notification_sent_ = true, unlocks mutex, starts iterating listeners. Thread B acquires mutex, sees shutdown_notification_sent_ = true (skips notification), proceeds to shutting_down_.store(true). Now shutting_down_ is true while Thread A's listener callbacks are still running.
  • Root cause: NotifyOnDBShutdownBegin releases mutex during listener iteration (standard pattern), creating a window for concurrent callers to advance the shutdown sequence.
  • Assessment: This is acceptable by design. The callbacks fired once (correct), they started before shutting_down_ was published (correct), and the listener's own shutting_down_ flag was set during the callback. Concurrent CancelAllBackgroundWork calls are rare (requires multiple threads racing to shut down). The guard flag shutdown_notification_sent_ is set before the mutex unlock, ensuring exactly-once semantics.
  • Suggested fix: Add a brief comment in NotifyOnDBShutdownBegin noting this acceptable race.
M3. MaybeEnableOpenFaultInjection returns true when CURRENT doesn't exist but faults are configured -- db_stress_tool/db_stress_test_base.cc:99
  • Issue: When open_fault_injection.Enabled() is true but CURRENT file doesn't exist, the function returns true without enabling any fault injection. The caller then runs the post-open cleanup/retry path. The original code would not enter the fault-injection block at all in this case.
  • Root cause: The refactoring intentionally returns true so the caller runs the "safe open with retry" path even for new databases, which is more conservative.
  • Assessment: This is a deliberate behavioral change documented in the function's comment: "Returns whether open-fault handling is active for this attempt. The caller uses this to run post-open cleanup/retry handling even if CURRENT did not exist." The post-open path with DisableAllThreadLocalErrorInjection() is safe even with nothing enabled. The s.ok() check means no unnecessary retry occurs for successful opens.
  • Suggested fix: None needed -- the comment explains the intent. The change is safe because the cleanup path is idempotent.
M4. API documentation could be more explicit about thread context and DB pointer lifetime -- include/rocksdb/listener.h
  • Issue: The OnDBShutdownBegin documentation doesn't specify: (a) which thread calls it, (b) whether DB mutex is held, (c) that the DB pointer must not be stored for later use (especially in failed-open case).
  • Suggested fix: Consider adding: "Called on the thread that initiates shutdown (not a background thread). The DB mutex is NOT held during this callback. The DB pointer is valid only for the duration of this callback; implementations must not store it for later use."

🟢 LOW / NIT

L1. Transaction path calls InitializeListenersForOpen once outside retry loop -- db_stress_tool/db_stress_test_base.cc
  • Issue: The transaction path doesn't have a retry loop, so InitializeListenersForOpen is called once. If the transaction DB open fails (which currently asserts), listeners would have stale state.
  • Assessment: The transaction path uses assert(s.ok()) on open failure, so this is dead code for now. The OnDBShutdownBegin callback from DBImpl destruction would still fire on the listeners, cleaning up any stale state. No action needed.
L2. No test for concurrent CancelAllBackgroundWork calls -- db/listener_test.cc
  • Issue: The two new tests cover normal shutdown and failed-open scenarios but don't test concurrent shutdown requests.
  • Assessment: The guard mechanism (shutdown_notification_sent_) is straightforward and protected by mutex. A concurrent test would be nice-to-have but isn't blocking. The stress test itself exercises concurrent shutdown paths.
L3. MultiOpsTxnsStressListener skips flush callbacks during shutdown -- db_stress_tool/multi_ops_txns_stress.h
  • Issue: After OnDBShutdownBegin sets shutting_down_, both OnFlushCompleted and OnCompactionCompleted return early, skipping VerifyPkSkFast() verification. This is correct for shutdown but means no verification runs for any flush/compaction that completes after shutdown begins.
  • Assessment: Correct behavior. During shutdown, the DB is in a transitional state and verification could race with teardown. The TSAN race described in the PR description confirms this is the right approach.

Cross-Component Analysis

Context Executes? Assumptions hold? Action needed?
WritePreparedTxnDB YES - destructor calls CancelAllBackgroundWork YES Safe
WriteUnpreparedTxnDB YES - same path YES Safe
ReadOnly DB YES - CloseHelper calls CancelAllBackgroundWork YES - no BG work to skip Safe
Secondary Instance YES - same shutdown path YES Safe
StackableDB (BlobDB) YES - via GetRootDB() YES - listeners get inner DBImpl ptr Safe
CompactionService YES - CancelAwaitingJobs runs before notification YES - ordering is correct (cancel remote first, then notify listeners) Safe
Concurrent writers N/A - shutdown path N/A Safe
Failed DB::Open YES - destructor fires notification YES - key use case Safe

Positive Observations

  • Exactly-once guarantee: shutdown_notification_sent_ is set before mutex unlock, providing a clean guard against double-firing even under concurrent shutdown.
  • Failed-open coverage: The automatic callback from CancelAllBackgroundWork elegantly handles the failed-open case without requiring manual coordination between DB::Open and the caller.
  • Clean refactoring: The OpenFaultInjectionConfig struct and helper functions simplify the retry loop significantly, reducing code duplication and improving readability.
  • Listener lifecycle fix: Moving InitializeListenersForOpen inside the retry loop ensures fresh listeners on each attempt, preventing stale state accumulation across failed opens.
  • Backward compatible: New virtual method with empty default, no changes to existing listener contracts.

ℹ️ About this response

Generated by Claude Code.
Review methodology: claude_md/code_review.md

Limitations:

  • Claude may miss context from files not in the diff
  • Large PRs may be truncated
  • Always apply human judgment to AI suggestions

Commands:

  • /claude-review [context] — Request a code review
  • /claude-query <question> — Ask about the PR or codebase
@meta-codesync

meta-codesync Bot commented May 22, 2026

Copy link
Copy Markdown

@xingbowang merged this pull request in 8bf167a.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

2 participants