Guard db_bench DB lifetime with RWMutex against shutdown race (#14754)#14754
Guard db_bench DB lifetime with RWMutex against shutdown race (#14754)#14754mszeszko-meta wants to merge 1 commit into
Conversation
|
@mszeszko-meta has exported this pull request. If you are a Meta employee, you can view the originating Diff in D104974784. |
✅ clang-tidy: No findings on changed linesCompleted in 142.8s. |
Codex Code Review - OBSOLETESuperseded by a newer AI review. Expand to see the original review.🟡 Codex Code ReviewAuto-triggered after CI passed — reviewing commit 39398af ❌ Codex review failed before producing findings. ℹ️ About this responseGenerated by Codex CLI. Limitations:
Commands:
|
Claude Code Review - OBSOLETESuperseded by a newer AI review. Expand to see the original review.✅ Claude Code ReviewAuto-triggered after CI passed — reviewing commit 39398af SummaryClean, well-scoped defensive fix that prevents SIGFPE cascades in db_bench during shutdown races. The approach ( High-severity findings (0): No high-severity findings. Full review (click to expand)Findings🟡 MEDIUMM1. Similar vulnerable code paths remain unguarded —
|
| Context | Relevant? | Assessment |
|---|---|---|
| Normal single-DB operation | Yes (hot path) | Safe — returns on first branch (db_.db != nullptr) |
| Normal multi-DB operation | Yes | Safe — multi_dbs_ is non-empty |
| Shutdown race (single-DB) | Yes (the bug) | Fixed — new guard catches multi_dbs_.empty() |
| Shutdown race (multi-DB) | Possible | Partially vulnerable — multi_dbs_ could be cleared while iterating |
| SeekRandom shutdown race | Yes | Not fixed — line 7714 bypasses SelectDBWithCfh |
Positive Observations
- Correct use of
_exit(1)instead ofexit()/db_bench_exit()to avoid recursive exit UB and atexit double-free — shows solid understanding of the problem. - The restructuring from
if/elseto two sequentialifstatements preserves identical hot-path behavior with zero overhead. - Clear, well-written comment explaining both the race condition and the
_exit()vsexit()rationale. - PR description is thorough: includes root cause analysis, crash-fault test data, and explicit scoping of what this does and doesn't fix.
ℹ️ About this response
Generated by Claude Code.
Review methodology: claude_md/code_review.md
Limitations:
- Claude may miss context from files not in the diff
- Large PRs may be truncated
- Always apply human judgment to AI suggestions
Commands:
/claude-review [context]— Request a code review/claude-query <question>— Ask about the PR or codebase
39398af to
a8fe5e0
Compare
…ok#14754) Summary: **Problem** When db_bench shuts down (fatal IO error, unknown benchmark name, etc.), ErrorExit() or ~Benchmark() destroys db_/multi_dbs_ while worker threads may still be inside benchmark methods holding raw DBWithColumnFamilies pointers returned by SelectDBWithCfh(). In single-DB mode, db_.db becomes nullptr and multi_dbs_ is empty, so workers evaluate rand_int % 0 and SIGFPE. The cascade across all workers masks whatever originally triggered the shutdown. Observed in the reliable_volumes crash-fault test at ~5 SIGFPE events per hour. **Approach** Guard DB lifetime at the worker-thread boundary with a reader-writer lock (port::RWMutex): workers acquire a read lock for the duration of their benchmark method; mutators acquire the write lock (which waits for active readers to drain) before destroying DBs. This ensures raw pointers returned by SelectDBWithCfh() remain valid for their entire usage scope. **Fix** - Workers hold a DbUseGuard (RAII wrapper over a read lock on db_lifecycle_rwlock_) for the entire benchmark method in ThreadBody. Cost: one rdlock/unlock pair per worker lifetime, zero hot-path overhead. - Every DB mutation path (ErrorExit, ~Benchmark, fresh-DB reopen) takes DbStateMutationGuard, which acquires the write lock and then stops the secondary update thread before mutation may proceed. - ErrorExit() routes to std::_Exit(1) when called from any thread that cannot safely run the mutation cleanup path: a DbUseGuard holder (would self-wait on the write lock) or the secondary update thread (would self-join via StopSecondaryUpdateThread). Tracked via two thread-locals: holds_db_use_guard_ and is_secondary_update_thread_. Main thread takes DbStateMutationGuard, cleans up, dispatches through db_bench_exit() / ToolHooks::Exit. - StopSecondaryUpdateThread() resets secondary_update_stopped_ to 0 after joining, so a replacement thread created by a subsequent Open() is not immediately killed. - Protocol is mechanically enforced in debug builds: * SelectDBWithCfh asserts holds_db_use_guard_ (read-side ownership) * DeleteDBs asserts holds_db_state_mutation_guard_ (write-side ownership) * DbUseGuard and DbStateMutationGuard ctor/dtor assert correct imbalance (no nested or stray release). * DbStateMutationGuard ctor additionally asserts !holds_db_use_guard_ and !is_secondary_update_thread_, catching the new deadlock modes the single-lock design makes reachable (WriteLock while holding ReadLock; secondary thread self-joining via StopSecondaryUpdateThread). These convert "trust me" invariants into "the assert fires if you break it." Direct field accesses (e.g. db_.db->NewIterator inside a benchmark method, multi_dbs_.clear() in the reopen branch) are protected by the enclosing guard scope but are not per-site asserted. **Caveat** port::RWMutex is pthread_rwlock_t with default attrs -> reader-preferred on Linux/glibc. The current call graph has no concurrent worker spawn during mutator wait (mutator paths run only from the main thread, not concurrently with RunBenchmark spawning workers), so writer starvation is not reachable. Documented as a constraint; revisit if that invariant changes. The port layer doesn't expose pthread_rwlockattr_setkind_np cross-platform. **Alternatives considered** - std::_Exit(1) on every shutdown path, skip cleanup entirely. Loses ToolHooks::Exit dispatch, flushed traces, and end-of-run stats -- those matter for the RV crash-fault test image which consumes db_bench output. Rejected. - shared_ptr<DBWithColumnFamilies> from SelectDBWithCfh, let DB lifetime extend naturally to the last reader. Adds a per-call atomic refcount bump in the hot path; the RWMutex approach is per-method-call instead of per-op, making the hot-path cost zero. Rejected for hot-path neutrality. - Hand-rolled mutex+cv+counter+flag protocol. Used in v1-v2 of this diff. Replaced with port::RWMutex per code review (xbw): same semantics, ~50 fewer lines, stdlib primitive, two extra ctor asserts. Chosen. Reviewed By: xingbowang Differential Revision: D104974784
a8fe5e0 to
c0154ba
Compare
🟡 Codex Code ReviewAuto-triggered after CI passed — reviewing commit c0154ba ❌ Codex review failed before producing findings. ℹ️ About this responseGenerated by Codex CLI. Limitations:
Commands:
|
|
This pull request has been merged in d1ab4bf. |
✅ Claude Code ReviewAuto-triggered after CI passed — reviewing commit c0154ba SummaryWell-designed fix for a real shutdown race in db_bench. The RWMutex + RAII guard approach is sound, correctly prevents the SIGFPE crash, and has zero hot-path overhead. The code is thoroughly documented with clear invariants and debug assertions. High-severity findings (0): No high-severity findings. Full review (click to expand)Findings🔴 HIGHNone. 🟡 MEDIUMM1.
|
| Context | Affected? | Analysis |
|---|---|---|
| Single DB mode | YES | Protected by DbUseGuard on worker threads |
Multi-DB mode (--num_multi_db) |
YES | Same lock protects both db_ and multi_dbs_ |
Secondary DB mode (--use_secondary_db) |
YES | Secondary thread flagged; stopped before mutation |
Read-only mode (--readonly) |
YES | Workers still need lifetime protection for reads |
| Non-DB benchmarks (Crc32c, xxHash) | YES | Guard overhead negligible (one rdlock/unlock per lifetime) |
| ToolHooks integration | PARTIAL | _Exit(1) bypasses hooks for worker/secondary thread errors |
Positive Observations
- Zero hot-path overhead: Read lock acquired once per worker lifetime, not per-operation. Debug asserts compiled out in release.
- Thorough documentation: Protocol comment block clearly explains the two-guard system, thread-local semantics, and ErrorExit routing.
- Defense in depth: Debug assertions at
SelectDBWithCfh,DeleteDBs, and guard ctors/dtors catch protocol violations. - Correct deadlock avoidance:
ErrorExit()checksholds_db_use_guard_before any lock acquisition.DbStateMutationGuardctor asserts!holds_db_use_guard_and!is_secondary_update_thread_. - RAII guard design: Non-copyable, non-movable guards prevent accidental misuse.
ℹ️ About this response
Generated by Claude Code.
Review methodology: claude_md/code_review.md
Limitations:
- Claude may miss context from files not in the diff
- Large PRs may be truncated
- Always apply human judgment to AI suggestions
Commands:
/claude-review [context]— Request a code review/claude-query <question>— Ask about the PR or codebase
Problem
When db_bench shuts down (fatal IO error, unknown benchmark name, etc.), ErrorExit() or ~Benchmark() destroys db_/multi_dbs_ while worker threads may still be inside benchmark methods holding raw DBWithColumnFamilies pointers returned by SelectDBWithCfh(). In single-DB mode, db_.db becomes nullptr and multi_dbs_ is empty, so workers evaluate rand_int % 0 and SIGFPE. The cascade across all workers masks whatever originally triggered the shutdown. Observed in the reliable_volumes crash-fault test at ~5 SIGFPE events per hour.
Approach
Guard DB lifetime at the worker-thread boundary with a reader-writer lock (port::RWMutex): workers acquire a read lock for the duration of their benchmark method; mutators acquire the write lock (which waits for active readers to drain) before destroying DBs. This ensures raw pointers returned by SelectDBWithCfh() remain valid for their entire usage scope.
Fix
Workers hold a DbUseGuard (RAII wrapper over a read lock on db_lifecycle_rwlock_) for the entire benchmark method in ThreadBody. Cost: one rdlock/unlock pair per worker lifetime, zero hot-path overhead.
Every DB mutation path (ErrorExit, ~Benchmark, fresh-DB reopen) takes DbStateMutationGuard, which acquires the write lock and then stops the secondary update thread before mutation may proceed.
ErrorExit() routes to std::Exit(1) when called from any thread that cannot safely run the mutation cleanup path: a DbUseGuard holder (would self-wait on the write lock) or the secondary update thread (would self-join via StopSecondaryUpdateThread). Tracked via two thread-locals: holds_db_use_guard and is_secondary_update_thread_. Main thread takes DbStateMutationGuard, cleans up, dispatches through db_bench_exit() / ToolHooks::Exit.
StopSecondaryUpdateThread() resets secondary_update_stopped_ to 0 after joining, so a replacement thread created by a subsequent Open() is not immediately killed.
Protocol is mechanically enforced in debug builds:
These convert "trust me" invariants into "the assert fires if you break it." Direct field accesses (e.g. db_.db->NewIterator inside a benchmark method, multi_dbs_.clear() in the reopen branch) are protected by the enclosing guard scope but are not per-site asserted.
Caveat
port::RWMutex is pthread_rwlock_t with default attrs -> reader-preferred on Linux/glibc. The current call graph has no concurrent worker spawn during mutator wait (mutator paths run only from the main thread, not concurrently with RunBenchmark spawning workers), so writer starvation is not reachable. Documented as a constraint; revisit if that invariant changes. The port layer doesn't expose pthread_rwlockattr_setkind_np cross-platform.
Alternatives considered
std::_Exit(1) on every shutdown path, skip cleanup entirely. Loses ToolHooks::Exit dispatch, flushed traces, and end-of-run stats -- those matter for the RV crash-fault test image which consumes db_bench output. Rejected.
shared_ptr from SelectDBWithCfh, let DB lifetime extend naturally to the last reader. Adds a per-call atomic refcount bump in the hot path; the RWMutex approach is per-method-call instead of per-op, making the hot-path cost zero. Rejected for hot-path neutrality.
Reviewed By: xingbowang
Differential Revision: D104974784