Notify listeners before DB shutdown begins#14769
Conversation
db_stress listener bookkeeping correlates compaction callbacks with file deletion callbacks. DBImpl intentionally skips some compaction listener callbacks once shutdown starts, but file deletion callbacks can still be delivered. That mismatch is normally tolerable for production listeners, but db_stress uses listener-local tracking to detect callback consistency and can report a false positive when shutdown interrupts a compaction callback sequence. The failed-open case is the important gap. DB::Open() can create a DBImpl, recover or flush files, schedule background compaction, and then fail during late open work such as persisting OPTIONS or waiting for open-time compaction under fault injection. At that point DB::Open() tears down the internal DBImpl before returning an error to StressTest::Open(). If OnCompactionBegin already recorded an input file or job in DbStressListener, DBImpl shutdown can suppress the later OnCompactionPreCommit/OnCompactionCompleted callbacks that would normally clear that state. Since control has not returned to the stress harness yet, StressTest::CleanUp()/Reopen() cannot notify the listener in time. A later shutdown-time callback, or the next open retry reusing the same listener, can then observe stale tracking and abort even though RocksDB did not compact the same SST concurrently. Add EventListener::OnDBShutdownBegin and fire it once from DBImpl::CancelAllBackgroundWork() before publishing shutting_down_. The callback also covers cleanup of a failed DB::Open() attempt, where the DB pointer refers to the internal DBImpl that was never returned to the caller. Track shutdown_notification_sent_ separately from shutting_down_ because listeners are invoked with mutex_ released, and a concurrent or reentrant cancellation must not deliver the callback twice. Update DbStressListener to consume the DBImpl-driven shutdown notification instead of relying on StressTest::CleanUp()/Reopen() to manually notify it. This lets db_stress mark itself as shutting down before DBImpl starts skipping shutdown-sensitive compaction notifications, including during failed-open cleanup. Also keep listener state scoped to one db_stress open attempt. Initialize listeners at the top of each non-transactional open retry before enabling open fault injection, so retry attempts get fresh listener state without widening open fault injection to listener construction. Factor the open fault setup into a small helper to keep the retry loop readable. The transaction open path still initializes listeners once because it does not use this open-fault retry loop. Tests: - make db_stress -j192 - make check-sources - make listener_test -j192 - ./listener_test --gtest_filter='EventListenerTest.OnDBShutdownBegin*' - git diff --check
✅ clang-tidy: No findings on changed linesCompleted in 271.8s. |
Codex Code Review - OBSOLETESuperseded by a newer AI review. Expand to see the original review.🟡 Codex Code ReviewAuto-triggered after CI passed — reviewing commit 3b97bf4 ❌ Codex review failed before producing findings. ℹ️ About this responseGenerated by Codex CLI. Limitations:
Commands:
|
Claude Code Review - OBSOLETESuperseded by a newer AI review. Expand to see the original review.✅ Claude Code ReviewAuto-triggered after CI passed — reviewing commit 3b97bf4 SummaryWell-designed PR that adds High-severity findings (0): No high-severity findings. Full review (click to expand)Findings🔴 HIGHNone. 🟡 MEDIUMM1. Concurrent
|
| Context | OnDBShutdownBegin fires? | Notes |
|---|---|---|
| Normal Close/destructor | Yes | Via CloseHelper → CancelAllBackgroundWork(false) |
| WritePreparedTxnDB | Yes, once | ~WritePreparedTxnDB calls first; base sees already_shutting_down |
| DBWithTTL | Yes | Via ~DBTtlImpl → CancelAllBackgroundWork(true) |
| ReadOnly/Secondary DB | Yes | Via base ~DBImpl |
| WaitForCompact(close_db) | Yes | Via Close() → normal path |
| Failed DB::Open() | Yes | impl unique_ptr destruction at db_impl_open.cc:2956 |
Exception safety: Not a concern — RocksDB builds with -fno-rtti in release; all existing Notify* methods use the same unlock-call-lock pattern without try/catch.
Positive Observations
- Eliminates the manual
NotifyShuttingDown()hack in db_stress with a proper lifecycle callback shutdown_notification_sent_correctly handles all concurrent/reentrant scenarios- Moving
InitializeListenersForOpeninside the retry loop is the right fix for listener lifecycle management - Tests cover the two primary paths (normal shutdown, failed open)
ℹ️ About this response
Generated by Claude Code.
Review methodology: claude_md/code_review.md
Limitations:
- Claude may miss context from files not in the diff
- Large PRs may be truncated
- Always apply human judgment to AI suggestions
Commands:
/claude-review [context]— Request a code review/claude-query <question>— Ask about the PR or codebase
|
@xingbowang has imported this pull request. If you are a Meta employee, you can view this in D106011452. |
1 similar comment
|
@xingbowang has imported this pull request. If you are a Meta employee, you can view this in D106011452. |
T272131642 reported a TSAN race where MultiOpsTxnsStressListener::OnCompactionCompleted called VerifyPkSkFast on a background compaction thread while the main thread was destroying the transaction DB wrapper in StressTest::CleanUp(). The reported read was a virtual call through the DB object and the write was the WritePreparedTxnDB/WriteUnpreparedTxnDB destructor updating the vptr. The vulnerable ordering is that DBImpl can already be inside NotifyOnCompactionCompleted before shutdown is requested. It unlocks db mutex while iterating listeners; another listener such as DbStressListener can spend time in its callback, giving the main thread time to enter CleanUp()/Close(). The DB object is still shutting down, but MultiOpsTxnsStressListener may be invoked later in the same callback iteration and call VerifyPkSkFast through stress_test_->db_aptr_, racing with DB wrapper destruction. D105719244 proposed adding a harness-side NotifyShuttingDown path for MultiOpsTxnsStressListener. With the previous commit's DBImpl-owned EventListener::OnDBShutdownBegin callback, the clearer fix is to have MultiOpsTxnsStressListener consume that callback directly. Once DBImpl begins shutdown, the listener skips both flush-completed and compaction-completed verification callbacks, avoiding DB access during teardown. This also covers failed-open cleanup without name-based downcasts in StressTest. Tests: - make db_stress -j192 - make check-sources - git diff --check
Codex Code Review - OBSOLETESuperseded by a newer AI review. Expand to see the original review.🟡 Codex Code ReviewAuto-triggered after CI passed — reviewing commit d81eb6c ❌ Codex review failed before producing findings. ℹ️ About this responseGenerated by Codex CLI. Limitations:
Commands:
|
Claude Code Review - OBSOLETESuperseded by a newer AI review. Expand to see the original review.✅ Claude Code ReviewAuto-triggered after CI passed — reviewing commit d81eb6c SummaryThis PR adds a well-designed High-severity findings (0): No high-severity findings. Full review (click to expand)Findings🔴 HIGHNo high-severity findings. 🟡 MEDIUMM1. Consider adding
|
| Context | Does notification fire? | Correct? | Notes |
|---|---|---|---|
| Normal DB close | Yes (via CloseHelper -> CancelAllBackgroundWork) | Yes | Standard path |
| WritePreparedTxnDB | Yes (first CancelAllBackgroundWork call wins) | Yes | Conditional check in destructor correctly avoids double-firing |
| TTL DB | Yes (explicit CancelAllBackgroundWork before Close) | Yes | Standard path |
| BlobDB | Yes (via underlying DBImpl destruction) | Yes | BlobDB destructor relies on DBImpl cleanup |
| DB::Open failure | Yes (via ~DBImpl -> CloseHelper) | Yes | Key use case; listener gets internal DBImpl pointer |
| ReadOnly DB | Yes (inherits DBImpl shutdown path) | Yes | No background work, but notification still fires |
| Concurrent CancelAllBackgroundWork | Fires exactly once | Yes | shutdown_notification_sent_ under mutex prevents double-firing; already_shutting_down provides fast path |
Assumption stress-test results:
- "Fires exactly once": Verified.
shutdown_notification_sent_undermutex_+already_shutting_downguard are correct. Directshutting_down_.store()calls only exist in test code, not production paths. - "Fires before shutting_down_ published": Verified. Sequential execution in CancelAllBackgroundWork guarantees ordering.
- "Mutex unlock window is safe": Verified. Same pattern used by all other Notify* methods. A concurrent CancelAllBackgroundWork during the unlock window will see
shutdown_notification_sent_=trueand skip. - "TSAN fix is correct": Verified.
Atomic<bool>uses acquire-release semantics. The listener'sshutting_down_flag is set duringOnDBShutdownBegin, which fires beforeCancelAllBackgroundWorkreturns, which happens before DB wrapper destruction. Background compaction callbacks that are already in-flight will see the flag.
Positive Observations
- The design correctly identifies
CancelAllBackgroundWork()as the universal funnel for all shutdown paths, making it the ideal notification point. - Using a separate
shutdown_notification_sent_bool rather than overloadingshutting_down_is the right call -- it cleanly separates "listeners have been notified" from "background work should stop." - The failed-open coverage is particularly valuable -- it closes a real gap where db_stress listeners could observe inconsistent state from partially-completed open attempts.
- The
OpenFaultInjectionConfigstruct and helper functions improve readability of the retry loop. - Moving
InitializeListenersForOpento the top of the retry loop (before fault injection setup) ensures listeners are always fresh per attempt and are never constructed under fault injection, which is correct.
ℹ️ About this response
Generated by Claude Code.
Review methodology: claude_md/code_review.md
Limitations:
- Claude may miss context from files not in the diff
- Large PRs may be truncated
- Always apply human judgment to AI suggestions
Commands:
/claude-review [context]— Request a code review/claude-query <question>— Ask about the PR or codebase
|
@xingbowang has imported this pull request. If you are a Meta employee, you can view this in D106011452. |
Resolve the db_stress open-fault conflict by keeping upstream's per-StressTest FaultInjectionTestFS/env cleanup while preserving fresh listener initialization for each non-transactional open retry.
🟡 Codex Code ReviewAuto-triggered after CI passed — reviewing commit 8cec536 ❌ Codex review failed before producing findings. ℹ️ About this responseGenerated by Codex CLI. Limitations:
Commands:
|
|
@xingbowang has imported this pull request. If you are a Meta employee, you can view this in D106011452. |
✅ Claude Code ReviewAuto-triggered after CI passed — reviewing commit 8cec536 SummaryWell-designed PR that adds High-severity findings (0): No high-severity findings. Full review (click to expand)Findings🔴 HIGHNone. 🟡 MEDIUMM1. Listener notification timing is later than the manual approach it replaces --
|
| Context | Executes? | Assumptions hold? | Action needed? |
|---|---|---|---|
| WritePreparedTxnDB | YES - destructor calls CancelAllBackgroundWork | YES | Safe |
| WriteUnpreparedTxnDB | YES - same path | YES | Safe |
| ReadOnly DB | YES - CloseHelper calls CancelAllBackgroundWork | YES - no BG work to skip | Safe |
| Secondary Instance | YES - same shutdown path | YES | Safe |
| StackableDB (BlobDB) | YES - via GetRootDB() | YES - listeners get inner DBImpl ptr | Safe |
| CompactionService | YES - CancelAwaitingJobs runs before notification | YES - ordering is correct (cancel remote first, then notify listeners) | Safe |
| Concurrent writers | N/A - shutdown path | N/A | Safe |
| Failed DB::Open | YES - destructor fires notification | YES - key use case | Safe |
Positive Observations
- Exactly-once guarantee:
shutdown_notification_sent_is set before mutex unlock, providing a clean guard against double-firing even under concurrent shutdown. - Failed-open coverage: The automatic callback from
CancelAllBackgroundWorkelegantly handles the failed-open case without requiring manual coordination betweenDB::Openand the caller. - Clean refactoring: The
OpenFaultInjectionConfigstruct and helper functions simplify the retry loop significantly, reducing code duplication and improving readability. - Listener lifecycle fix: Moving
InitializeListenersForOpeninside the retry loop ensures fresh listeners on each attempt, preventing stale state accumulation across failed opens. - Backward compatible: New virtual method with empty default, no changes to existing listener contracts.
ℹ️ About this response
Generated by Claude Code.
Review methodology: claude_md/code_review.md
Limitations:
- Claude may miss context from files not in the diff
- Large PRs may be truncated
- Always apply human judgment to AI suggestions
Commands:
/claude-review [context]— Request a code review/claude-query <question>— Ask about the PR or codebase
|
@xingbowang merged this pull request in 8bf167a. |
Summary
Notify listeners before DB shutdown begins
db_stress listener bookkeeping correlates compaction callbacks with file deletion callbacks. DBImpl intentionally skips some compaction listener callbacks once shutdown starts, but file deletion callbacks can still be delivered. That mismatch is normally tolerable for production listeners, but db_stress uses listener-local tracking to detect callback consistency and can report a false positive when shutdown interrupts a compaction callback sequence.
The failed-open case is the important gap. DB::Open() can create a DBImpl, recover or flush files, schedule background compaction, and then fail during late open work such as persisting OPTIONS or waiting for open-time compaction under fault injection. At that point DB::Open() tears down the internal DBImpl before returning an error to StressTest::Open(). If OnCompactionBegin already recorded an input file or job in DbStressListener, DBImpl shutdown can suppress the later OnCompactionPreCommit/OnCompactionCompleted callbacks that would normally clear that state. Since control has not returned to the stress harness yet, StressTest::CleanUp()/Reopen() cannot notify the listener in time. A later shutdown-time callback, or the next open retry reusing the same listener, can then observe stale tracking and abort even though RocksDB did not compact the same SST concurrently.
Add EventListener::OnDBShutdownBegin and fire it once from DBImpl::CancelAllBackgroundWork() before publishing shutting_down_. The callback also covers cleanup of a failed DB::Open() attempt, where the DB pointer refers to the internal DBImpl that was never returned to the caller. Track shutdown_notification_sent_ separately from shutting_down_ because listeners are invoked with mutex_ released, and a concurrent or reentrant cancellation must not deliver the callback twice.
Update DbStressListener to consume the DBImpl-driven shutdown notification instead of relying on StressTest::CleanUp()/Reopen() to manually notify it. This lets db_stress mark itself as shutting down before DBImpl starts skipping shutdown-sensitive compaction notifications, including during failed-open cleanup.
Also keep listener state scoped to one db_stress open attempt. Initialize listeners at the top of each non-transactional open retry before enabling open fault injection, so retry attempts get fresh listener state without widening open fault injection to listener construction. Factor the open fault setup into a small helper to keep the retry loop readable. The transaction open path still initializes listeners once because it does not use this open-fault retry loop.
Also fix multi-ops transaction listener checks during shutdown
A TSAN race was reported where MultiOpsTxnsStressListener::OnCompactionCompleted called VerifyPkSkFast on a background compaction thread while the main thread was destroying the transaction DB wrapper in StressTest::CleanUp(). The reported read was a virtual call through the DB object and the write was the WritePreparedTxnDB/WriteUnpreparedTxnDB destructor updating the vptr.
The vulnerable ordering is that DBImpl can already be inside NotifyOnCompactionCompleted before shutdown is requested. It unlocks db mutex while iterating listeners; another listener such as DbStressListener can spend time in its callback, giving the main thread time to enter CleanUp()/Close(). The DB object is still shutting down, but MultiOpsTxnsStressListener may be invoked later in the same callback iteration and call VerifyPkSkFast through stress_test_->db_aptr_, racing with DB wrapper destruction.
With DBImpl-owned EventListener::OnDBShutdownBegin callback, we have MultiOpsTxnsStressListener consume that callback directly. Once DBImpl begins shutdown, the listener skips both flush-completed and compaction-completed verification callbacks, avoiding DB access during teardown. This also covers failed-open cleanup without name-based downcasts in StressTest.
Test Plan
CI