Skip to content

Guard db_bench DB lifetime with RWMutex against shutdown race (#14754)#14754

Closed
mszeszko-meta wants to merge 1 commit into
facebook:mainfrom
mszeszko-meta:export-D104974784
Closed

Guard db_bench DB lifetime with RWMutex against shutdown race (#14754)#14754
mszeszko-meta wants to merge 1 commit into
facebook:mainfrom
mszeszko-meta:export-D104974784

Conversation

@mszeszko-meta

@mszeszko-meta mszeszko-meta commented May 18, 2026

Copy link
Copy Markdown
Contributor

Problem

When db_bench shuts down (fatal IO error, unknown benchmark name, etc.), ErrorExit() or ~Benchmark() destroys db_/multi_dbs_ while worker threads may still be inside benchmark methods holding raw DBWithColumnFamilies pointers returned by SelectDBWithCfh(). In single-DB mode, db_.db becomes nullptr and multi_dbs_ is empty, so workers evaluate rand_int % 0 and SIGFPE. The cascade across all workers masks whatever originally triggered the shutdown. Observed in the reliable_volumes crash-fault test at ~5 SIGFPE events per hour.

Approach

Guard DB lifetime at the worker-thread boundary with a reader-writer lock (port::RWMutex): workers acquire a read lock for the duration of their benchmark method; mutators acquire the write lock (which waits for active readers to drain) before destroying DBs. This ensures raw pointers returned by SelectDBWithCfh() remain valid for their entire usage scope.

Fix

  • Workers hold a DbUseGuard (RAII wrapper over a read lock on db_lifecycle_rwlock_) for the entire benchmark method in ThreadBody. Cost: one rdlock/unlock pair per worker lifetime, zero hot-path overhead.

  • Every DB mutation path (ErrorExit, ~Benchmark, fresh-DB reopen) takes DbStateMutationGuard, which acquires the write lock and then stops the secondary update thread before mutation may proceed.

  • ErrorExit() routes to std::Exit(1) when called from any thread that cannot safely run the mutation cleanup path: a DbUseGuard holder (would self-wait on the write lock) or the secondary update thread (would self-join via StopSecondaryUpdateThread). Tracked via two thread-locals: holds_db_use_guard and is_secondary_update_thread_. Main thread takes DbStateMutationGuard, cleans up, dispatches through db_bench_exit() / ToolHooks::Exit.

  • StopSecondaryUpdateThread() resets secondary_update_stopped_ to 0 after joining, so a replacement thread created by a subsequent Open() is not immediately killed.

  • Protocol is mechanically enforced in debug builds:

    • SelectDBWithCfh asserts holds_db_use_guard_ (read-side ownership)
    • DeleteDBs asserts holds_db_state_mutation_guard_ (write-side ownership)
    • DbUseGuard and DbStateMutationGuard ctor/dtor assert correct imbalance (no nested or stray release).
    • DbStateMutationGuard ctor additionally asserts !holds_db_use_guard_ and !is_secondary_update_thread_, catching the new deadlock modes the single-lock design makes reachable (WriteLock while holding ReadLock; secondary thread self-joining via StopSecondaryUpdateThread).
      These convert "trust me" invariants into "the assert fires if you break it." Direct field accesses (e.g. db_.db->NewIterator inside a benchmark method, multi_dbs_.clear() in the reopen branch) are protected by the enclosing guard scope but are not per-site asserted.

Caveat

port::RWMutex is pthread_rwlock_t with default attrs -> reader-preferred on Linux/glibc. The current call graph has no concurrent worker spawn during mutator wait (mutator paths run only from the main thread, not concurrently with RunBenchmark spawning workers), so writer starvation is not reachable. Documented as a constraint; revisit if that invariant changes. The port layer doesn't expose pthread_rwlockattr_setkind_np cross-platform.

Alternatives considered

  • std::_Exit(1) on every shutdown path, skip cleanup entirely. Loses ToolHooks::Exit dispatch, flushed traces, and end-of-run stats -- those matter for the RV crash-fault test image which consumes db_bench output. Rejected.

  • shared_ptr from SelectDBWithCfh, let DB lifetime extend naturally to the last reader. Adds a per-call atomic refcount bump in the hot path; the RWMutex approach is per-method-call instead of per-op, making the hot-path cost zero. Rejected for hot-path neutrality.

Reviewed By: xingbowang

Differential Revision: D104974784

@meta-cla meta-cla Bot added the CLA Signed label May 18, 2026
@meta-codesync

meta-codesync Bot commented May 18, 2026

Copy link
Copy Markdown

@mszeszko-meta has exported this pull request. If you are a Meta employee, you can view the originating Diff in D104974784.

@github-actions

github-actions Bot commented May 18, 2026

Copy link
Copy Markdown

✅ clang-tidy: No findings on changed lines

Completed in 142.8s.

@mszeszko-meta mszeszko-meta requested a review from xingbowang May 18, 2026 21:42
@github-actions

github-actions Bot commented May 18, 2026

Copy link
Copy Markdown

Codex Code Review - OBSOLETE

Superseded by a newer AI review. Expand to see the original review.

🟡 Codex Code Review

Auto-triggered after CI passed — reviewing commit 39398af


Codex review failed before producing findings.

WARNING: proceeding, even though we could not update PATH: Refusing to create helper binaries under temporary dir "/tmp" (codex_home: AbsolutePathBuf("/tmp/codex-home"))
error: the argument '--base <BRANCH>' cannot be used with '[PROMPT]'

Usage: codex exec review --commit <SHA> --base <BRANCH> --title <TITLE> --model <MODEL> --config <key=value> --dangerously-bypass-approvals-and-sandbox --output-last-message <FILE> [PROMPT]

For more information, try '--help'.

ℹ️ About this response

Generated by Codex CLI.
Review methodology: claude_md/code_review.md

Limitations:

  • Codex may miss context from files not in the diff
  • Large PRs may be truncated
  • Always apply human judgment to AI suggestions

Commands:

  • /codex-review [context] — Request a code review
  • /codex-query <question> — Ask about the PR or codebase
@github-actions

github-actions Bot commented May 18, 2026

Copy link
Copy Markdown

Claude Code Review - OBSOLETE

Superseded by a newer AI review. Expand to see the original review.

✅ Claude Code Review

Auto-triggered after CI passed — reviewing commit 39398af


Summary

Clean, well-scoped defensive fix that prevents SIGFPE cascades in db_bench during shutdown races. The approach (_exit(1) to avoid recursive exit() UB) is correct. Zero hot-path overhead. No high-severity issues found.

High-severity findings (0):

No high-severity findings.

Full review (click to expand)

Findings

🟡 MEDIUM

M1. Similar vulnerable code paths remain unguarded — db_bench_tool.cc:7714
  • Issue: SeekRandom() at line 7714 accesses multi_dbs_.size() directly without going through SelectDBWithCfh:
    uint64_t db_idx_to_use =
        (db_.db == nullptr)
            ? (uint64_t{thread->rand.Next()} % multi_dbs_.size())  // SIGFPE if empty
            : 0;
    This has the identical modulo-by-zero vulnerability during shutdown races. Line 7754 (multi_dbs_[db_idx_to_use].db) and line 5714 (num_key_gens = multi_dbs_.size()) in DoWrite() also access multi_dbs_ directly.
  • Root cause: The fix correctly guards the centralized SelectDBWithCfh() path, but several benchmark methods bypass it and access multi_dbs_ directly.
  • Suggested fix: Consider a follow-up PR adding similar empty-checks at lines 7714, 7754, and 5714. The PR description explicitly scopes this out ("out of scope here"), which is reasonable for an incremental fix, but these should be tracked.

🟢 LOW / NIT

L1. Theoretical TOCTOU between empty() and size()db_bench_tool.cc:5708-5719
  • Issue: A thread could theoretically clear multi_dbs_ between the multi_dbs_.empty() check (line 5708) and the multi_dbs_.size() call in the return statement (line 5719). This window is nanoseconds and multi_dbs_ is only cleared once during shutdown, so practical impact is negligible.
  • Suggested fix: None needed. Even if hit, this is the same crash the original code always had. Fixing this properly requires the atomic shutdown flag mentioned in the PR description.
L2. fprintf(stderr) during concurrent exit — db_bench_tool.cc:5713
  • Issue: fprintf is not guaranteed to be safe when another thread is running exit() (which flushes/closes stdio streams). The message could be truncated or lost.
  • Suggested fix: None needed. This is best-effort diagnostic output before _exit(). Losing the message is harmless.
L3. Exit code indistinguishable from normal errors — db_bench_tool.cc:5716
  • Issue: _exit(1) uses the same exit code as db_bench_exit(1). A distinct code (e.g., 2) would make it easier to distinguish shutdown races from other failures in automation.
  • Suggested fix: Consider using a distinct exit code, though this is a nit for a benchmark tool.

Cross-Component Analysis

Context Relevant? Assessment
Normal single-DB operation Yes (hot path) Safe — returns on first branch (db_.db != nullptr)
Normal multi-DB operation Yes Safe — multi_dbs_ is non-empty
Shutdown race (single-DB) Yes (the bug) Fixed — new guard catches multi_dbs_.empty()
Shutdown race (multi-DB) Possible Partially vulnerable — multi_dbs_ could be cleared while iterating
SeekRandom shutdown race Yes Not fixed — line 7714 bypasses SelectDBWithCfh

Positive Observations

  • Correct use of _exit(1) instead of exit() / db_bench_exit() to avoid recursive exit UB and atexit double-free — shows solid understanding of the problem.
  • The restructuring from if/else to two sequential if statements preserves identical hot-path behavior with zero overhead.
  • Clear, well-written comment explaining both the race condition and the _exit() vs exit() rationale.
  • PR description is thorough: includes root cause analysis, crash-fault test data, and explicit scoping of what this does and doesn't fix.

ℹ️ About this response

Generated by Claude Code.
Review methodology: claude_md/code_review.md

Limitations:

  • Claude may miss context from files not in the diff
  • Large PRs may be truncated
  • Always apply human judgment to AI suggestions

Commands:

  • /claude-review [context] — Request a code review
  • /claude-query <question> — Ask about the PR or codebase
@meta-codesync meta-codesync Bot changed the title Guard db_bench SelectDBWithCfh against shutdown race May 21, 2026
…ok#14754)

Summary:

**Problem**

When db_bench shuts down (fatal IO error, unknown benchmark name, etc.), ErrorExit() or ~Benchmark() destroys db_/multi_dbs_ while worker threads may still be inside benchmark methods holding raw DBWithColumnFamilies pointers returned by SelectDBWithCfh(). In single-DB mode, db_.db becomes nullptr and multi_dbs_ is empty, so workers evaluate rand_int % 0 and SIGFPE. The cascade across all workers masks whatever originally triggered the shutdown. Observed in the reliable_volumes crash-fault test at ~5 SIGFPE events per hour.

**Approach**

Guard DB lifetime at the worker-thread boundary with a reader-writer lock (port::RWMutex): workers acquire a read lock for the duration of their benchmark method; mutators acquire the write lock (which waits for active readers to drain) before destroying DBs. This ensures raw pointers returned by SelectDBWithCfh() remain valid for their entire usage scope.

**Fix**

- Workers hold a DbUseGuard (RAII wrapper over a read lock on db_lifecycle_rwlock_) for the entire benchmark method in ThreadBody. Cost: one rdlock/unlock pair per worker lifetime, zero hot-path overhead.

- Every DB mutation path (ErrorExit, ~Benchmark, fresh-DB reopen) takes DbStateMutationGuard, which acquires the write lock and then stops the secondary update thread before mutation may proceed.

- ErrorExit() routes to std::_Exit(1) when called from any thread that cannot safely run the mutation cleanup path: a DbUseGuard holder (would self-wait on the write lock) or the secondary update thread (would self-join via StopSecondaryUpdateThread). Tracked via two thread-locals: holds_db_use_guard_ and is_secondary_update_thread_. Main thread takes DbStateMutationGuard, cleans up, dispatches through db_bench_exit() / ToolHooks::Exit.

- StopSecondaryUpdateThread() resets secondary_update_stopped_ to 0 after joining, so a replacement thread created by a subsequent Open() is not immediately killed.

- Protocol is mechanically enforced in debug builds:
    * SelectDBWithCfh asserts holds_db_use_guard_  (read-side ownership)
    * DeleteDBs        asserts holds_db_state_mutation_guard_ (write-side ownership)
    * DbUseGuard and DbStateMutationGuard ctor/dtor assert correct imbalance (no nested or stray release).
    * DbStateMutationGuard ctor additionally asserts !holds_db_use_guard_ and !is_secondary_update_thread_, catching the new deadlock modes the single-lock design makes reachable (WriteLock while holding ReadLock; secondary thread self-joining via StopSecondaryUpdateThread).
  These convert "trust me" invariants into "the assert fires if you break it." Direct field accesses (e.g. db_.db->NewIterator inside a benchmark method, multi_dbs_.clear() in the reopen branch) are protected by the enclosing guard scope but are not per-site asserted.

**Caveat**

port::RWMutex is pthread_rwlock_t with default attrs -> reader-preferred on Linux/glibc. The current call graph has no concurrent worker spawn during mutator wait (mutator paths run only from the main thread, not concurrently with RunBenchmark spawning workers), so writer starvation is not reachable. Documented as a constraint; revisit if that invariant changes. The port layer doesn't expose pthread_rwlockattr_setkind_np cross-platform.

**Alternatives considered**

- std::_Exit(1) on every shutdown path, skip cleanup entirely. Loses ToolHooks::Exit dispatch, flushed traces, and end-of-run stats -- those matter for the RV crash-fault test image which consumes db_bench output. Rejected.

- shared_ptr<DBWithColumnFamilies> from SelectDBWithCfh, let DB lifetime extend naturally to the last reader. Adds a per-call atomic refcount bump in the hot path; the RWMutex approach is per-method-call instead of per-op, making the hot-path cost zero. Rejected for hot-path neutrality.

- Hand-rolled mutex+cv+counter+flag protocol. Used in v1-v2 of this diff. Replaced with port::RWMutex per code review (xbw): same semantics, ~50 fewer lines, stdlib primitive, two extra ctor asserts. Chosen.

Reviewed By: xingbowang

Differential Revision: D104974784
@meta-codesync meta-codesync Bot changed the title Guard db_bench DB lifetime with RWMutex against shutdown race May 21, 2026
@github-actions

Copy link
Copy Markdown

🟡 Codex Code Review

Auto-triggered after CI passed — reviewing commit c0154ba


Codex review failed before producing findings.

WARNING: proceeding, even though we could not update PATH: Refusing to create helper binaries under temporary dir "/tmp" (codex_home: AbsolutePathBuf("/tmp/codex-home"))
error: the argument '--base <BRANCH>' cannot be used with '[PROMPT]'

Usage: codex exec review --commit <SHA> --base <BRANCH> --title <TITLE> --model <MODEL> --config <key=value> --dangerously-bypass-approvals-and-sandbox --output-last-message <FILE> [PROMPT]

For more information, try '--help'.

ℹ️ About this response

Generated by Codex CLI.
Review methodology: claude_md/code_review.md

Limitations:

  • Codex may miss context from files not in the diff
  • Large PRs may be truncated
  • Always apply human judgment to AI suggestions

Commands:

  • /codex-review [context] — Request a code review
  • /codex-query <question> — Ask about the PR or codebase
@meta-codesync meta-codesync Bot closed this in d1ab4bf May 21, 2026
@meta-codesync

meta-codesync Bot commented May 21, 2026

Copy link
Copy Markdown

This pull request has been merged in d1ab4bf.

@github-actions

Copy link
Copy Markdown

✅ Claude Code Review

Auto-triggered after CI passed — reviewing commit c0154ba


Summary

Well-designed fix for a real shutdown race in db_bench. The RWMutex + RAII guard approach is sound, correctly prevents the SIGFPE crash, and has zero hot-path overhead. The code is thoroughly documented with clear invariants and debug assertions.

High-severity findings (0):

No high-severity findings.

Full review (click to expand)

Findings

🔴 HIGH

None.

🟡 MEDIUM

M1. std::_Exit(1) bypasses ToolHooks::Exit() -- db_bench_tool.cc:3698
  • Issue: When a worker thread or secondary update thread hits ErrorExit(), the code calls std::_Exit(1), which bypasses ToolHooks::Exit() dispatch, unflushed stdio buffers, and end-of-run statistics. The PR description acknowledges this matters for "the RV crash-fault test image which consumes db_bench output."
  • Root cause: Intentional design tradeoff -- worker threads holding the read lock cannot safely acquire the write lock (deadlock), and the secondary thread cannot self-join. std::_Exit(1) is the only safe option for these contexts.
  • Suggested fix: Consider adding a brief diagnostic message before std::_Exit(1) using write(STDERR_FILENO, ...) (not fprintf, which won't flush) to indicate which thread is terminating and why, aiding post-mortem diagnosis. This is a suggestion, not a blocking concern.
M2. No test coverage for the new synchronization mechanism -- db_bench_tool.cc
  • Issue: The PR adds 152 lines of concurrency logic (RWMutex, RAII guards, thread-local routing, _Exit fallback) but no tests. The existing db_bench_tool_test.cc has only 4 test cases focused on options parsing.
  • Root cause: db_bench's test infrastructure doesn't support concurrency scenario testing.
  • Suggested fix: At minimum, add a debug-build test exercising the ErrorExit() routing logic. The debug assertions provide some coverage, but a regression test for the core routing would increase confidence.

🟢 LOW / NIT

L1. StopSecondaryUpdateThread() called while holding write lock -- db_bench_tool.cc:4287
  • Issue: DbStateMutationGuard takes the write lock, then joins the secondary thread. If the secondary thread is blocked in SleepForMicroseconds, the write lock is held for the full sleep interval.
  • Suggested fix: No change needed. No deadlock possible (secondary thread holds no lock). Delay is bounded by FLAGS_secondary_update_interval.
L2. Reader-preferred RWMutex caveat -- db_bench_tool.cc:2955
  • Issue: pthread_rwlock_t with default attrs is reader-preferred on Linux/glibc.
  • Suggested fix: No change needed. Well-documented; writer starvation not reachable in current call graph.
L3. Thread-local variables are per-thread, not per-instance -- db_bench_tool.cc:4243-4262
  • Issue: Static thread_local bools would conflict with multiple Benchmark instances on the same thread.
  • Suggested fix: No change needed. Only one Benchmark instance is ever created (db_bench_tool.cc:9637).
L4. secondary_update_stopped_ reset uses memory_order_relaxed -- db_bench_tool.cc:4240
  • Issue: Relaxed store after join().
  • Suggested fix: No change needed. thread::join() provides happens-before per C++ standard [thread.thread.member]/5.

Cross-Component Analysis

Context Affected? Analysis
Single DB mode YES Protected by DbUseGuard on worker threads
Multi-DB mode (--num_multi_db) YES Same lock protects both db_ and multi_dbs_
Secondary DB mode (--use_secondary_db) YES Secondary thread flagged; stopped before mutation
Read-only mode (--readonly) YES Workers still need lifetime protection for reads
Non-DB benchmarks (Crc32c, xxHash) YES Guard overhead negligible (one rdlock/unlock per lifetime)
ToolHooks integration PARTIAL _Exit(1) bypasses hooks for worker/secondary thread errors

Positive Observations

  • Zero hot-path overhead: Read lock acquired once per worker lifetime, not per-operation. Debug asserts compiled out in release.
  • Thorough documentation: Protocol comment block clearly explains the two-guard system, thread-local semantics, and ErrorExit routing.
  • Defense in depth: Debug assertions at SelectDBWithCfh, DeleteDBs, and guard ctors/dtors catch protocol violations.
  • Correct deadlock avoidance: ErrorExit() checks holds_db_use_guard_ before any lock acquisition. DbStateMutationGuard ctor asserts !holds_db_use_guard_ and !is_secondary_update_thread_.
  • RAII guard design: Non-copyable, non-movable guards prevent accidental misuse.

ℹ️ About this response

Generated by Claude Code.
Review methodology: claude_md/code_review.md

Limitations:

  • Claude may miss context from files not in the diff
  • Large PRs may be truncated
  • Always apply human judgment to AI suggestions

Commands:

  • /claude-review [context] — Request a code review
  • /claude-query <question> — Ask about the PR or codebase
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment