Add multi-DB stress testing support (--num_dbs flag) (#14749)#14749
Add multi-DB stress testing support (--num_dbs flag) (#14749)#14749hx235 wants to merge 1 commit into
Conversation
|
@hx235 has exported this pull request. If you are a Meta employee, you can view the originating Diff in D104959942. |
|
| Check | Count |
|---|---|
concurrency-mt-unsafe |
3 |
cppcoreguidelines-avoid-non-const-global-variables |
3 |
| Total | 6 |
Details
db_stress_tool/db_stress_common.cc (2 warning(s))
db_stress_tool/db_stress_common.cc:25:56: warning: variable 'wbm' is non-const and globally accessible, consider making it const [cppcoreguidelines-avoid-non-const-global-variables]
db_stress_tool/db_stress_common.cc:26:49: warning: variable 'rate_limiter' is non-const and globally accessible, consider making it const [cppcoreguidelines-avoid-non-const-global-variables]
db_stress_tool/db_stress_tool.cc (4 warning(s))
db_stress_tool/db_stress_tool.cc:41:5: warning: variable 'fault_fs_for_crash_report' is non-const and globally accessible, consider making it const [cppcoreguidelines-avoid-non-const-global-variables]
db_stress_tool/db_stress_tool.cc:416:9: warning: function is not thread safe [concurrency-mt-unsafe]
db_stress_tool/db_stress_tool.cc:439:7: warning: function is not thread safe [concurrency-mt-unsafe]
db_stress_tool/db_stress_tool.cc:448:9: warning: function is not thread safe [concurrency-mt-unsafe]
Summary: Add --num_dbs flag to run N independent DB instances in parallel. Each StressTest instance has its own DB with isolated fault injection (from Diff 2). db_crashtest.py defaults to num_dbs=1. Path handling design: crash_main functions (blackbox/whitebox) compute specific per-DB paths from a base directory using get_subdirectory_paths() -- pure path computation, no I/O. gen_cmd validates path count matches num_dbs, creates EV directories, and handles destroy_db_initially for expected state. Both db and expected_values_dir follow the same validate logic in gen_cmd. If user passes specific paths (comma-separated), they are validated but not expanded. Shared resources (block_cache, write_buffer_manager, rate_limiter) are created once in tool.cc and shared across all DB instances. Most flags (threads, max_key, ops_per_thread) are per-DB. Other changes: - --num_dbs flag (default 1) with guards for not-yet-supported features - [db_label] prefix in all stdout output including stats (e.g. [db_0] Stress Test: ...) - Per-DB crash callback vector (fault_fs_for_crash_report, raw pointer, signal-safe) - Fault injection log path includes GetDbLabel() for per-DB uniqueness - Secondary paths are ephemeral, generated directly by C++ - flockfile/funlockfile for multi-DB stats atomicity Differential Revision: D104959942
Summary: Add --num_dbs flag to run N independent DB instances in parallel. Each StressTest instance has its own DB with isolated fault injection (from Diff 2). db_crashtest.py defaults to num_dbs=1. Path handling design: crash_main functions (blackbox/whitebox) compute specific per-DB paths from a base directory using get_subdirectory_paths() -- pure path computation, no I/O. gen_cmd validates path count matches num_dbs, creates EV directories, and handles destroy_db_initially for expected state. Both db and expected_values_dir follow the same validate logic in gen_cmd. If user passes specific paths (comma-separated), they are validated but not expanded. Shared resources (block_cache, write_buffer_manager, rate_limiter) are created once in tool.cc and shared across all DB instances. Most flags (threads, max_key, ops_per_thread) are per-DB. Other changes: - --num_dbs flag (default 1) with guards for not-yet-supported features - [db_label] prefix in all stdout output including stats (e.g. [db_0] Stress Test: ...) - Per-DB crash callback vector (fault_fs_for_crash_report, raw pointer, signal-safe) - Fault injection log path includes GetDbLabel() for per-DB uniqueness - Secondary paths are ephemeral, generated directly by C++ - flockfile/funlockfile for multi-DB stats atomicity Differential Revision: D104959942
Summary: Add --num_dbs flag to run N independent DB instances in parallel. Each StressTest instance has its own DB with isolated fault injection (from Diff 2). db_crashtest.py defaults to num_dbs=1. Path handling design: crash_main functions (blackbox/whitebox) compute specific per-DB paths from a base directory using get_subdirectory_paths() -- pure path computation, no I/O. gen_cmd validates path count matches num_dbs, creates EV directories, and handles destroy_db_initially for expected state. Both db and expected_values_dir follow the same validate logic in gen_cmd. If user passes specific paths (comma-separated), they are validated but not expanded. Shared resources (block_cache, write_buffer_manager, rate_limiter) are created once in tool.cc and shared across all DB instances. Most flags (threads, max_key, ops_per_thread) are per-DB. Other changes: - --num_dbs flag (default 1) with guards for not-yet-supported features - [db_label] prefix in all stdout output including stats (e.g. [db_0] Stress Test: ...) - Per-DB crash callback vector (fault_fs_for_crash_report, raw pointer, signal-safe) - Fault injection log path includes GetDbLabel() for per-DB uniqueness - Secondary paths are ephemeral, generated directly by C++ - flockfile/funlockfile for multi-DB stats atomicity Differential Revision: D104959942
Summary: Add --num_dbs flag to run N independent DB instances in parallel. Each StressTest instance has its own DB with isolated fault injection (from Diff 2). db_crashtest.py defaults to num_dbs=1. Path handling design: crash_main functions (blackbox/whitebox) compute specific per-DB paths from a base directory using get_subdirectory_paths() -- pure path computation, no I/O. gen_cmd validates path count matches num_dbs, creates EV directories, and handles destroy_db_initially for expected state. Both db and expected_values_dir follow the same validate logic in gen_cmd. If user passes specific paths (comma-separated), they are validated but not expanded. Shared resources (block_cache, write_buffer_manager, rate_limiter) are created once in tool.cc and shared across all DB instances. Most flags (threads, max_key, ops_per_thread) are per-DB. Other changes: - --num_dbs flag (default 1) with guards for not-yet-supported features - [db_label] prefix in all stdout output including stats (e.g. [db_0] Stress Test: ...) - Per-DB crash callback vector (fault_fs_for_crash_report, raw pointer, signal-safe) - Fault injection log path includes GetDbLabel() for per-DB uniqueness - Secondary paths are ephemeral, generated directly by C++ - flockfile/funlockfile for multi-DB stats atomicity Differential Revision: D104959942
4de8a83 to
b072a0e
Compare
Summary: Add --num_dbs flag to run N independent DB instances in parallel. Each StressTest instance has its own DB with isolated fault injection (from Diff 2). db_crashtest.py defaults to num_dbs=1. Path handling design: crash_main functions (blackbox/whitebox) compute specific per-DB paths from a base directory using get_subdirectory_paths() -- pure path computation, no I/O. gen_cmd validates path count matches num_dbs, creates EV directories, and handles destroy_db_initially for expected state. Both db and expected_values_dir follow the same validate logic in gen_cmd. If user passes specific paths (comma-separated), they are validated but not expanded. Shared resources (block_cache, write_buffer_manager, rate_limiter) are created once in tool.cc and shared across all DB instances. Most flags (threads, max_key, ops_per_thread) are per-DB. Other changes: - --num_dbs flag (default 1) with guards for not-yet-supported features - [db_label] prefix in all stdout output including stats (e.g. [db_0] Stress Test: ...) - Per-DB crash callback vector (fault_fs_for_crash_report, raw pointer, signal-safe) - Fault injection log path includes GetDbLabel() for per-DB uniqueness - Secondary paths are ephemeral, generated directly by C++ - flockfile/funlockfile for multi-DB stats atomicity Differential Revision: D104959942
Summary: Add --num_dbs flag to run N independent DB instances in parallel. Each StressTest instance has its own DB with isolated fault injection (from Diff 2). db_crashtest.py defaults to num_dbs=1. Path handling design: crash_main functions (blackbox/whitebox) compute specific per-DB paths from a base directory using get_subdirectory_paths() -- pure path computation, no I/O. gen_cmd validates path count matches num_dbs, creates EV directories, and handles destroy_db_initially for expected state. Both db and expected_values_dir follow the same validate logic in gen_cmd. If user passes specific paths (comma-separated), they are validated but not expanded. Shared resources (block_cache, write_buffer_manager, rate_limiter) are created once in tool.cc and shared across all DB instances. Most flags (threads, max_key, ops_per_thread) are per-DB. Other changes: - --num_dbs flag (default 1) with guards for not-yet-supported features - [db_label] prefix in all stdout output including stats (e.g. [db_0] Stress Test: ...) - Per-DB crash callback vector (fault_fs_for_crash_report, raw pointer, signal-safe) - Fault injection log path includes GetDbLabel() for per-DB uniqueness - Secondary paths are ephemeral, generated directly by C++ - flockfile/funlockfile for multi-DB stats atomicity Differential Revision: D104959942
Summary: Add --num_dbs flag to run N independent DB instances in parallel. Each StressTest instance has its own DB with isolated fault injection (from Diff 2). db_crashtest.py defaults to num_dbs=1. Path handling design: crash_main functions (blackbox/whitebox) compute specific per-DB paths from a base directory using get_subdirectory_paths() -- pure path computation, no I/O. gen_cmd validates path count matches num_dbs, creates EV directories, and handles destroy_db_initially for expected state. Both db and expected_values_dir follow the same validate logic in gen_cmd. If user passes specific paths (comma-separated), they are validated but not expanded. Shared resources (block_cache, write_buffer_manager, rate_limiter) are created once in tool.cc and shared across all DB instances. Most flags (threads, max_key, ops_per_thread) are per-DB. Other changes: - --num_dbs flag (default 1) with guards for not-yet-supported features - [db_label] prefix in all stdout output including stats (e.g. [db_0] Stress Test: ...) - Per-DB crash callback vector (fault_fs_for_crash_report, raw pointer, signal-safe) - Fault injection log path includes GetDbLabel() for per-DB uniqueness - Secondary paths are ephemeral, generated directly by C++ - flockfile/funlockfile for multi-DB stats atomicity Differential Revision: D104959942
Summary: Add --num_dbs flag to run N independent DB instances in parallel. Each StressTest instance has its own DB with isolated fault injection (from Diff 2). db_crashtest.py defaults to num_dbs=1. Path handling design: crash_main functions (blackbox/whitebox) compute specific per-DB paths from a base directory using get_subdirectory_paths() -- pure path computation, no I/O. gen_cmd validates path count matches num_dbs, creates EV directories, and handles destroy_db_initially for expected state. Both db and expected_values_dir follow the same validate logic in gen_cmd. If user passes specific paths (comma-separated), they are validated but not expanded. Shared resources (block_cache, write_buffer_manager, rate_limiter) are created once in tool.cc and shared across all DB instances. Most flags (threads, max_key, ops_per_thread) are per-DB. Other changes: - --num_dbs flag (default 1) with guards for not-yet-supported features - [db_label] prefix in all stdout output including stats (e.g. [db_0] Stress Test: ...) - Per-DB crash callback vector (fault_fs_for_crash_report, raw pointer, signal-safe) - Fault injection log path includes GetDbLabel() for per-DB uniqueness - Secondary paths are ephemeral, generated directly by C++ - flockfile/funlockfile for multi-DB stats atomicity Differential Revision: D104959942
Codex Code Review - OBSOLETESuperseded by a newer AI review. Expand to see the original review.🟡 Codex Code ReviewAuto-triggered after CI passed — reviewing commit e775a94 ❌ Codex review failed before producing findings. ℹ️ About this responseGenerated by Codex CLI. Limitations:
Commands:
|
Claude Code Review - OBSOLETESuperseded by a newer AI review. Expand to see the original review.✅ Claude Code ReviewAuto-triggered after CI passed — reviewing commit e775a94 SummaryClean, well-structured PR that adds multi-DB stress testing with proper resource sharing and isolation. The design is sound -- per-DB paths, shared cache/WBM/rate_limiter, per-DB fault injection, and clear phase separation in the orchestration code. Most incompatible features are correctly guarded. High-severity findings (1):
Full review (click to expand)Findings🔴 HIGHH1.
|
| Shared Resource | Thread-Safe? | Guarded? |
|---|---|---|
| block_cache | Yes | N/A (safe to share) |
| WriteBufferManager | Yes | N/A (safe to share) |
| RateLimiter | Yes | N/A (safe to share) |
| CompressedSecondaryCache | Yes | Blocked with num_dbs>1 |
| PoolSizeChangeThread | N/A | Blocked with num_dbs>1 |
| SyncPoint (debug) | Fragile | Safe via destruction order |
Positive Observations
- Clean 4-phase orchestration design (create -> init -> run -> cleanup)
- Proper shared resource factoring (cache, WBM, rate limiter)
- Good feature incompatibility guards in both C++ and Python
- Signal-safe crash callback with vector of fault_fs pointers
[db_N]output labeling for easy per-DB filtering
ℹ️ About this response
Generated by Claude Code.
Review methodology: claude_md/code_review.md
Limitations:
- Claude may miss context from files not in the diff
- Large PRs may be truncated
- Always apply human judgment to AI suggestions
Commands:
/claude-review [context]— Request a code review/claude-query <question>— Ask about the PR or codebase
Summary: Add --num_dbs flag to run N independent DB instances in parallel. Each StressTest instance has its own DB with isolated fault injection (from Diff 2). db_crashtest.py defaults to num_dbs=1. Path handling design: crash_main functions (blackbox/whitebox) compute specific per-DB paths from a base directory using get_subdirectory_paths() -- pure path computation, no I/O. gen_cmd validates path count matches num_dbs, creates EV directories, and handles destroy_db_initially for expected state. Both db and expected_values_dir follow the same validate logic in gen_cmd. If user passes specific paths (comma-separated), they are validated but not expanded. Shared resources (block_cache, write_buffer_manager, rate_limiter) are created once in tool.cc and shared across all DB instances. Most flags (threads, max_key, ops_per_thread) are per-DB. Other changes: - --num_dbs flag (default 1) with guards for not-yet-supported features - [db_label] prefix in all stdout output including stats (e.g. [db_0] Stress Test: ...) - Per-DB crash callback vector (fault_fs_for_crash_report, raw pointer, signal-safe) - Fault injection log path includes GetDbLabel() for per-DB uniqueness - Secondary paths are ephemeral, generated directly by C++ - flockfile/funlockfile for multi-DB stats atomicity Differential Revision: D104959942
Summary: Add --num_dbs flag to run N independent DB instances in parallel. Each StressTest instance has its own DB with isolated fault injection (from Diff 2). db_crashtest.py defaults to num_dbs=1. Path handling design: crash_main functions (blackbox/whitebox) compute specific per-DB paths from a base directory using get_subdirectory_paths() -- pure path computation, no I/O. gen_cmd validates path count matches num_dbs, creates EV directories, and handles destroy_db_initially for expected state. Both db and expected_values_dir follow the same validate logic in gen_cmd. If user passes specific paths (comma-separated), they are validated but not expanded. Shared resources (block_cache, write_buffer_manager, rate_limiter) are created once in tool.cc and shared across all DB instances. Most flags (threads, max_key, ops_per_thread) are per-DB. Other changes: - --num_dbs flag (default 1) with guards for not-yet-supported features - [db_label] prefix in all stdout output including stats (e.g. [db_0] Stress Test: ...) - Per-DB crash callback vector (fault_fs_for_crash_report, raw pointer, signal-safe) - Fault injection log path includes GetDbLabel() for per-DB uniqueness - Secondary paths are ephemeral, generated directly by C++ - flockfile/funlockfile for multi-DB stats atomicity Differential Revision: D104959942
a5ee2a5 to
29cb7bc
Compare
Summary: Add `--num_dbs` flag to run N independent DB instances in parallel. Each `StressTest` instance has its own DB with isolated fault injection (from D104959945). `db_crashtest.py` defaults to `num_dbs=1`. For `num_dbs=1`: `--db` and `--expected_values_dir` are paths used as-is. For `num_dbs>1`: they are parent directories; C++ creates `db_0/`, `db_1/`, ... subdirs underneath. Path ownership: C++ owns DB and secondary dir creation (supports remote env). Python owns EV dir creation (always local). C++ also creates EV dirs as fallback for direct CLI usage. `DestroyAllDbs` cleans up subdirs and the parent dir. Per-DB: `threads`, `max_key`, `ops_per_thread`, `reopen`, `column_families`, and all DB options. Shared: background env threads (compaction, flush pool), `block_cache`, `write_buffer_manager`, `compressed_secondary_cache`, `rate_limiter`, `compaction_thread_pool_adjust_interval`. Reviewed By: anand1976 Differential Revision: D104959942
Codex Code Review - OBSOLETESuperseded by a newer AI review. Expand to see the original review.🟡 Codex Code ReviewAuto-triggered after CI passed — reviewing commit 29cb7bc ❌ Codex review failed before producing findings. ℹ️ About this responseGenerated by Codex CLI. Limitations:
Commands:
|
Claude Code Review - OBSOLETESuperseded by a newer AI review. Expand to see the original review.✅ Claude Code ReviewAuto-triggered after CI passed — reviewing commit 29cb7bc SummaryThis PR adds a High-severity findings (2):
Full review (click to expand)Findings🔴 HIGHH1. Background thread lifecycle tied to first DB --
|
| Context | Multi-DB safe? | Notes |
|---|---|---|
| Shared block_cache | Yes | Thread-safe cache |
| Shared WBM | Partial | Budget not scaled |
| Shared RateLimiter | Partial | Aggregate throttling |
| FaultInjectionTestFS | Yes | Per-DB isolated |
| PoolSizeChangeThread | No | Tied to first DB lifecycle |
| CompressedCacheSetCapacityThread | No | Same issue |
| Signal handler | Yes | Vector populated before threads start |
Positive Observations
- Clean per-DB/shared resource separation
num_dbs=1is a true no-op change- Good Python-side flag sanitization
- Report mutex prevents interleaved output
- Crash callback vector fully populated before threads start
ℹ️ About this response
Generated by Claude Code.
Review methodology: claude_md/code_review.md
Limitations:
- Claude may miss context from files not in the diff
- Large PRs may be truncated
- Always apply human judgment to AI suggestions
Commands:
/claude-review [context]— Request a code review/claude-query <question>— Ask about the PR or codebase
|
Both high-severity findings are tolerable for simpler designs. |
Summary: Add `--num_dbs` flag to run N independent DB instances in parallel. Each `StressTest` instance has its own DB with isolated fault injection (from D104959945). `db_crashtest.py` defaults to `num_dbs=1`. For `num_dbs=1`: `--db` and `--expected_values_dir` are paths used as-is. For `num_dbs>1`: they are parent directories; C++ creates `db_0/`, `db_1/`, ... subdirs underneath. Path ownership: C++ owns DB and secondary dir creation (supports remote env). Python owns EV dir creation (always local). C++ also creates EV dirs as fallback for direct CLI usage. `DestroyAllDbs` cleans up subdirs and the parent dir. Per-DB: `threads`, `max_key`, `ops_per_thread`, `reopen`, `column_families`, and all DB options. Shared: background env threads (compaction, flush pool), `block_cache`, `write_buffer_manager`, `compressed_secondary_cache`, `rate_limiter`, `compaction_thread_pool_adjust_interval`. Reviewed By: anand1976 Differential Revision: D104959942
🟡 Codex Code ReviewAuto-triggered after CI passed — reviewing commit 33a915b ❌ Codex review failed before producing findings. ℹ️ About this responseGenerated by Codex CLI. Limitations:
Commands:
|
✅ Claude Code ReviewAuto-triggered after CI passed — reviewing commit 33a915b SummaryWell-structured PR that cleanly separates per-DB state from shared global resources. The multi-DB orchestration, path management, and thread lifecycle coordination are mostly correct. No high-severity correctness bugs found after deep analysis. High-severity findings (0): No high-severity findings. Full review (click to expand)Findings🟡 MEDIUMM1. Remote DB behavior change in
|
| Context | Applies? | Safe? | Notes |
|---|---|---|---|
| Shared block_cache across DBs | Yes | Yes | Cache is thread-safe; sharing is intentional |
| Shared WriteBufferManager | Yes | Yes | WBM is designed for cross-DB sharing |
| Shared rate_limiter | Yes | Yes | Rate limiter is thread-safe |
| Shared compressed_secondary_cache | Yes | Yes | CompressedCacheSetCapacityThread checks capacity (not usage); safe |
| PoolSizeChangeThread via call_once | Yes | Yes | Only modifies global env thread pool; one instance is correct |
| Per-DB FaultInjectionTestFS | Yes | Yes | Each DB has isolated fault injection |
| Per-DB DbVerificationThread | Yes | Yes | Each DB verifies its own state independently |
| bg_thread ThreadState lifetime | Yes | Safe | First DB waits for BgThreadsFinished before returning; stack variable outlives the bg thread |
Positive Observations
- Clean separation of per-DB state (paths, fault injection) from shared global resources (cache, WBM, rate limiter)
std::call_oncecorrectly prevents duplicate global bg threads while keeping per-DB bg threads (continuous verification) independent- Good use of db_label for multi-DB log disambiguation
- Python/C++ path ownership split is well-documented in comments
- Proper validation of incompatible flag combinations (ValidateNumDbsFlags + finalize_and_sanitize)
- Results vector access is race-free (each thread writes to its own index)
- Crash callback vector is pre-sized before registration, safe for signal handler iteration
- New Python unit tests cover flag sanitization for multi-DB
ℹ️ About this response
Generated by Claude Code.
Review methodology: claude_md/code_review.md
Limitations:
- Claude may miss context from files not in the diff
- Large PRs may be truncated
- Always apply human judgment to AI suggestions
Commands:
/claude-review [context]— Request a code review/claude-query <question>— Ask about the PR or codebase
|
This pull request has been merged in c724aeb. |
Summary:
Add
--num_dbsflag to run N independent DB instances in parallel. EachStressTestinstance has its own DB with isolated fault injection (from D104959945).db_crashtest.pydefaults tonum_dbs=1.For
num_dbs=1:--dband--expected_values_dirare paths used as-is.For
num_dbs>1: they are parent directories; C++ createsdb_0/,db_1/, ... subdirs underneath.Path ownership: C++ owns DB and secondary dir creation (supports remote env). Python owns EV dir creation (always local). C++ also creates EV dirs as fallback for direct CLI usage.
DestroyAllDbscleans up subdirs and the parent dir.Per-DB:
threads,max_key,ops_per_thread,reopen,column_families, and all DB options.Shared: background env threads (compaction, flush pool),
block_cache,write_buffer_manager,compressed_secondary_cache,rate_limiter,compaction_thread_pool_adjust_interval.Reviewed By: anand1976
Differential Revision: D104959942